Why RKE cluster is not longer avaiable from rancher when you stop a single node?

Andres_Gorostidi · November 1, 2021, 9:02am

Hi,

I created my first RKE cluster on AWS using Rancher, with only 3 nodes (with both, master and worker roles). To do some testings (and avoid the cost of using 3 nodes all the time), I stopped one of the nodes. At that time, the cluster appeared as “un-avaiable” from Rancher, and could not longer access to it. I have to started the node again. Why ? I do not understand the reason, because being a cluster, it should be all the time active, no matter if you have the 3 nodes or just 1 running…

wcoateRR · November 1, 2021, 3:08pm

K3S allows 2 master nodes. My understanding is that RKE & RKE2 require an odd number, so maybe it’s trying to get that fixed before allowing new things? I’d think it should keep what’s already there, though.

mattmattox · November 1, 2021, 3:48pm

So with RKE we needed an odd number of master nodes for etcd because etcd needs quorum in order to work. With k3s, we added the kine etcd adapter https://github.com/k3s-io/kine which allows you to use other databases like dqlite, Postgres, or MySQL in-place of etcd which are externally managed. By doing this k3s only needs two master nodes because they are only handling the control-plane roles (kube-apiserver, kube-scheduler, kube-controller-manager, etc) and these services are all really only active on a single node at a time (kube-apiserver is active on all nodes at all times but the other services like kube-scheduler have a leader election process wherein only one node is active at a time for that service.)
For RKE2, uses etcd so the same rules apply when needing an odd number of master nodes. Note: Currently RKE2 doesn’t have kine support (see Feature/Question Consolidated etcd · Issue #453 · rancher/rke2 · GitHub for more details).

Topic		Replies	Views
How are folks approaching HA with k8s clusters in production? Rancher 2.x	5	1588	June 19, 2019
Three "All" Node in HA? Rancher 2.x	1	1309	June 25, 2018
[SOLVED] Remove failed ETCD node Rancher 2.x	0	1571	October 13, 2021
RKE HA understanding Rancher 2.x	2	404	October 9, 2020
Rancher Cluster unavailable after reboot Rancher 2.x	2	3387	July 31, 2018

Why RKE cluster is not longer avaiable from rancher when you stop a single node?

Related Topics