Node Stuck at "Waiting for node to register..." When Trying to Rejoin Cluster

As described in this other thread, I am trying to re-add a node to my cluster after removing it.

The process I followed so far is this:

  1. Drain the node via Rancher UI.
  2. Delete the node via Rancher UI.
  3. Wait for cluster to stabilize.
  4. Reboot the node.
  5. Run through the manual cleanup steps documented here
  6. Reboot the nodes again (last step of manual cleanup process).
  7. Attempt to re-add the nodes (docker run ...)

The node shows as Unavailble in the Rancher UI with the error message “Kubelet stopped posting node status.” The Roles shown in Rancher UI match what I specified when attempting to re-add the node.

From the command line kubectl get nodes also lists the node, but the ROLES are listed as <none> and the AGE is about 15 minutes larger than it should be based on when I ran the docker run... command.

docker ps run on the node shows that the only container running is rancher/rancher-agent. docker logs shows the following message over and over again:

I have tried searching the forums but did not find anything that seemed relevant to my situation.

The unusual AGE shown in kubectl output makes me wonder if maybe rebooting the node before purging the Docker bits caused the node to attempt to re-register itself automatically. Is this a possibility?

Any insight into what might have caused the issue or how I can resolve it would be greatly appreciated.

So, based on these two GitHub Issues, it seems this is a known issue.

I was able to successfully delete the node (again) by using the workaround described (viewing the node in the API and deleting it from there).

I guess for future readers, the right thing to do is to run through and manually clean up the node before rebooting the first time. And if the node does come back anyway, delete it again via the API before attempting to rejoin.