Node Stuck at "Waiting for node to register..." When Trying to Rejoin Cluster

jsmyth · November 15, 2021, 8:22pm

As described in this other thread, I am trying to re-add a node to my cluster after removing it.

The process I followed so far is this:

Drain the node via Rancher UI.
Delete the node via Rancher UI.
Wait for cluster to stabilize.
Reboot the node.
Run through the manual cleanup steps documented here
Reboot the nodes again (last step of manual cleanup process).
Attempt to re-add the nodes (docker run ...)

The node shows as Unavailble in the Rancher UI with the error message “Kubelet stopped posting node status.” The Roles shown in Rancher UI match what I specified when attempting to re-add the node.

From the command line kubectl get nodes also lists the node, but the ROLES are listed as <none> and the AGE is about 15 minutes larger than it should be based on when I ran the docker run... command.

docker ps run on the node shows that the only container running is rancher/rancher-agent. docker logs shows the following message over and over again:

I have tried searching the forums but did not find anything that seemed relevant to my situation.

The unusual AGE shown in kubectl output makes me wonder if maybe rebooting the node before purging the Docker bits caused the node to attempt to re-register itself automatically. Is this a possibility?

Any insight into what might have caused the issue or how I can resolve it would be greatly appreciated.

jsmyth · November 16, 2021, 9:15pm

So, based on these two GitHub Issues, it seems this is a known issue.

I was able to successfully delete the node (again) by using the workaround described (viewing the node in the API and deleting it from there).

I guess for future readers, the right thing to do is to run through and manually clean up the node before rebooting the first time. And if the node does come back anyway, delete it again via the API before attempting to rejoin.

Fraser_Goffin · January 24, 2022, 11:56am

Or just accept that nodes are ‘cattle’ and thus disposable. So instead of all that clean up faff, just delete the node altogether and mint a new one via your automation.

Topic		Replies	Views
"Waiting to register with Kubernetes" for days Rancher	5	8288	April 7, 2020
Unable to add node in rancher cluster Support	0	632	October 3, 2022
Node stuck in registering state Rancher	1	685	April 15, 2021
Dangling node after registration Rancher 1.x	0	962	October 7, 2019
Unable to remove kubernetes worker node from rancher cluster UI Rancher	0	556	October 3, 2022

Node Stuck at "Waiting for node to register..." When Trying to Rejoin Cluster

Related Topics