[SOLVED] Remove failed ETCD node

Hi there,

note: I’m not very comfortable in english, hope i’ll be understandable

I’ve follow rancher documentation to setup RKE on a single node, after what i was able to connect to rancher GUI and everything went good.

I’ve then set up a kubernetes cluster using the web interface, containing 2 nodes, where theses 2 nodes had all the roles : Worker, control plane and etcd.

I’m actually trying to simulate a case where one of the 2 nodes goes down, by shutting down the first one. When i do it, the cluster become unavailable on the rancher UI, and i only can go on it through Cluster Management to try to fix the problem (it’s also sometimes unavailable through the GUI, so i have to use the rancher CLI).

On the CLI, i tried to remove the “failing” node using :

rancher nodes rm <node1_id>

but for now, the cluster is stalled in an “Updating” state, it says it is “removing” the node, but nothing happen, and the cluster keep this state :

ID                       NAME                   STATE      POOL      DESCRIPTION
c-zmznq:m-32651d10ae53   rancher-wok2   active
c-zmznq:m-f9b4a1162ac7   rancher-wok1   removing

I’ve read this in the documentation :

When a node in your etcd cluster becomes unhealthy, the recommended approach is to fix or remove the failed or unhealthy node before adding a new etcd node to the cluster.

As i want to be able to replace a failed etcd + control plane node, i would be able to achieve getting back the cluster in a normal state, with a new etcd + control plane node, perhaps i can’t find a solution to remove the node from rancher when this one is really down.

Do you have any advice for me ?

Thanks if some of you could help me :slight_smile:

Happy coding,

Edit : Bruh … I need 3 nodes for ETCD as explained in the requirements. I guess i should be more focused on reading the documentation next time, if someone can confirm :slight_smile: