Three "All" Node in HA?

Gneisler · June 25, 2018, 2:30pm

I have a three node cluster, all three are all nodes (Control, etcd, worker) all are up and running on vmware (1GB memory a piece)

Over the weekend without any change I received the following error.

This cluster is currently Unavailable; areas that interact directly with it will not be available until the API is ready.
Failed to communicate with API server: Get https://10.10.10.172:6443/api/v1/componentstatuses: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Turns out the node ran out of memory and killed the apiServer connection, I have since bumped memory up to 4GB.

However the cluster did go down…So with three etcd nodes, why is the cluster down? shouldn’t this be in HA?

Thanks in advance!

etlweather · June 25, 2018, 5:18pm

Short answer - the cluster shouldn’t go down…

Same or similar issues:

github.com/rancher/rancher

"Cluster not available" shown in UI if worker node reboots

opened 07:23AM - 03 Jun 18 UTC

closed 10:06PM - 06 Sep 18 UTC

jomeier

kind/bug priority/1 status/more-info area/rke

Hi, what is the reason for rancher 2 UI to decide, when a cluster gets unavai…lable? I created a test cluster with rke consisting of three nodes: one with etcd and control, two nodes which are workers. Rancher 2 latest runs on a different server (not on this cluster). Than I tried to simulate destruction of nodes by rebooting them. If I reboot the 1st worker, rancher UI is fine and tells me after a few seconds, that this node is not available. If I reboot the 2nd worker, rancher UI tells me - in a big wide square - that my cluster is not available. I not even can "launch a kubectl" console in the UI because it's blocked. I assumed, that rebooting workers shouldn't make the whole cluster unavailable. It's intersting, that during both reboots I can access the cluster's API on a linux shell with kubectl. Only rancher UI acts strange. Greetz, Josef

github.com/rancher/rancher

When control plane becomes unavailable (host powered down) when there is another control plane in the cluster , not able to deploy nodes in the worker plane and other control plane.

opened 09:56PM - 24 May 18 UTC

closed 09:15PM - 06 Aug 18 UTC

sangeethah

kind/bug

**Rancher versions:v2.0.2 **Steps to Reproduce:** Create a cluster with foll…owing node configurations: 1 control (n1) 1 etcd (n2) 1 worker (n3) Add 1 more control node (n4) Power down control node - n1. Wait for the node to be marked "unavailable". Try to create a daemon set. 3 pods get created out of which only 1 pod was able to start sucessfully which is on the new control node. There is an attempt made to start a pod in the worker node that fails with following error: ``` Normal SuccessfulMountVolume 5m kubelet, ip-172-31-3-155 MountVolume.SetUp succeeded for volume "default-token-mqlnn" Normal SandboxChanged 4m (x12 over 5m) kubelet, ip-172-31-3-155 Pod sandbox changed, it will be killed and re-created. Warning FailedCreatePodSandBox 14s (x100 over 5m) kubelet, ip-172-31-3-155 Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hellotest-qjr5t_default" network: error getting ClusterInformation: Get https://10.43.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.43.0.1:443: getsockopt: no route to host ``` Other pod is attempted to start in the control node that is unavailable. Deploying a pod with some scale , results in all the pods getting deployed on the worker node and it get stuck in "ContainerCreating" state. ``` NAME READY STATUS RESTARTS AGE hello1-74f74757b9-5swtv 0/1 ContainerCreating 0 39m hello1-74f74757b9-8rdp2 0/1 ContainerCreating 0 39m hello1-74f74757b9-gz72c 0/1 ContainerCreating 0 39m hello1-74f74757b9-xql8l 0/1 ContainerCreating 0 39m ``` **Results:**

Topic		Replies	Views
API-Server unreachable with 1/3 node down, HA? Rancher	0	835	January 29, 2019
Rancher API Server Rancher	0	1142	July 9, 2020
HA-Cluster not reliable [solved] Rancher	1	1898	August 17, 2018
[SOLVED] HA failover not working Rancher	2	2265	May 22, 2018
Cluster api access is stuck on a missing node Rancher	4	1281	April 15, 2022

Three "All" Node in HA?

Related topics