Hi there,
note: I’m not very comfortable in english, hope i’ll be understandable
I’ve follow rancher documentation to setup RKE on a single node, after what i was able to connect to rancher GUI and everything went good.
I’ve then set up a kubernetes cluster using the web interface, containing 2 nodes, where theses 2 nodes had all the roles : Worker, control plane and etcd.
I’m actually trying to simulate a case where one of the 2 nodes goes down, by shutting down the first one. When i do it, the cluster become unavailable on the rancher UI, and i only can go on it through Cluster Management to try to fix the problem (it’s also sometimes unavailable through the GUI, so i have to use the rancher CLI).
On the CLI, i tried to remove the “failing” node using :
rancher nodes rm <node1_id>
but for now, the cluster is stalled in an “Updating” state, it says it is “removing” the node, but nothing happen, and the cluster keep this state :
ID NAME STATE POOL DESCRIPTION
c-zmznq:m-32651d10ae53 rancher-wok2 active
c-zmznq:m-f9b4a1162ac7 rancher-wok1 removing
I’ve read this in the documentation :
When a node in your etcd cluster becomes unhealthy, the recommended approach is to fix or remove the failed or unhealthy node before adding a new etcd node to the cluster.
As i want to be able to replace a failed etcd + control plane node, i would be able to achieve getting back the cluster in a normal state, with a new etcd + control plane node, perhaps i can’t find a solution to remove the node from rancher when this one is really down.
Do you have any advice for me ?
Thanks if some of you could help me
Happy coding,
Edit : Bruh … I need 3 nodes for ETCD as explained in the requirements. I guess i should be more focused on reading the documentation next time, if someone can confirm