Cluster not ready after power failure

jfwilkus · April 23, 2023, 1:44am

Earlier today, I had a power failure in my house which lasted longer than my UPS batteries. It’s a development lab and I didn’t notice until an hour or so after the failure.

I’ve been trying to bring the 3 node Harvester cluster back online, nodes 2 and 3 are in the Not Ready status, but node 1’s console is reporting “Setting up Harvester”.

The cluster had been running fine for 2 months but now I can’t figure out how to fix node 1.

I can ssh into all the nodes, the /var/log/console.log on node 1 is reporting:

rke2[24118]: time="2023-04-23T01:37:51Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https:/[23/1858] 1:9345/v1-rke2/readyz: 500 Internal Server Error"

But I am not very experienced with Rancher so it’s unclear where to look to troubleshoot further.

The rke2 process is currently consuming 100% of the CPU on all three nodes. I’m considering powering them down until I can troubleshoot further.

Any help or pointers would be appreciated.

I can also provide more logs or details if needed.

Thanks!

James

Topic		Replies	Views
Unable to create rke2 cluster Rancher	2	5777	March 25, 2023
Rancher deploy pending state Rancher	2	893	September 29, 2021
Cluster api access is stuck on a missing node Rancher	4	1287	April 15, 2022
Rancher RKE2 cluster airgap installation getting failed Rancher	3	5008	March 7, 2022
Cluster unavailable - Failed to communicate with API server - waiting for cluster agent to connect Rancher	0	3355	February 28, 2019

Cluster not ready after power failure

Related topics