I have installed a single-node Rancher 2.1.6 with a 5-node Kubernetes Cluster. Two of the nodes are etcd and Control Plane, the other 3 are workers.
All nodes (including rancher server) run Ubuntu 16.04.5 with Docker 17.03.2.
If I shutdown any of the nodes on which etcd/Control Plane run, then the Kubernetes Cluster becomes unreachable altogether. Acessing the node leads to:
This cluster is currently Unavailable ; areas that interact directly with it will not be available until the API is ready.
Failed to communicate with API server: Get https://public_ip:6443/api/v1/componentstatuses?timeout=30s: dial tcp public_ip:6443: connect: no route to host
My question is, how do I achieve the necessary redundancy, so that, if one of the two etcd/control plane nodes fails, the other one can take over?
Should I have distributed the roles differently? It does seem consistent with the documentation, but I know this isn’t fixed at all. Someone recommended that etcd could reside on the same nodes as the workers.