Earlier today, I had a power failure in my house which lasted longer than my UPS batteries. It’s a development lab and I didn’t notice until an hour or so after the failure.
I’ve been trying to bring the 3 node Harvester cluster back online, nodes 2 and 3 are in the Not Ready status, but node 1’s console is reporting “Setting up Harvester”.
The cluster had been running fine for 2 months but now I can’t figure out how to fix node 1.
I can ssh into all the nodes, the
/var/log/console.log on node 1 is reporting:
rke2: time="2023-04-23T01:37:51Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https:/[23/1858] 1:9345/v1-rke2/readyz: 500 Internal Server Error"
But I am not very experienced with Rancher so it’s unclear where to look to troubleshoot further.
rke2 process is currently consuming 100% of the CPU on all three nodes. I’m considering powering them down until I can troubleshoot further.
Any help or pointers would be appreciated.
I can also provide more logs or details if needed.