Using the the current stable 2.0.6 (but also tried with 2.0.7 this morning) I either set up the cluster wrong or a cluster is generally not reliable in Rancher 2.x.
Rancher version: 2.0.6
OS: Ubuntu 18.04
Setup: HA (three Rancher nodes building the Kubernetes cluster)
Created a new cluster inside the Rancher 2.x UI.
Added four nodes to this new cluster.
3 Nodes have role “All”
1 Node has role “Worker”
When I shut down one of the “All” nodes, the whole cluster fails. In the UI an error is shown:
This cluster is currently Unavailable ; areas that interact directly with it will not be available until the API is ready.
Failed to communicate with API server: Get https://192.168.254.32:6443/api/v1/componentstatuses: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
According to infos I found in [SOLVED] HA failover not working, three nodes are required for etcd and controlplane. Hence the three nodes having role “All”.
So one of the cluster nodes goes down, the whole cluster goes down… Please tell me there’s something I forgot in the cluster setup, because that’s a no-go for production.