New nodes are endlessly "Waiting to register with Kubernetes"

We have a Rancher Provisioned cluster (VMware driver) and when we scale up new worker nodes they sit there forever at “Waiting to register with Kubernetes”. This is Rancher 2.5.6, and the cluster itself is v1.19.7-rancher1-1.

I sshed into the newly created node and looked at the rancher-agent container logs there is no errors:

[root@mynewnode1 ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
06f23a8797e5 rancher/rancher-agent:v2.5.6 “run.sh --server htt…” 2 minutes ago Up 2 minutes quirky_robinson
[root@lmynewnode1 ~]# docker logs 06f23a8797e5
INFO: Arguments: --server https://rancher.acme.com --token REDACTED --ca-checksum 8xxxxxxxxxxxxxxxxe1076 -r -n m-skl8s
INFO: Environment: CATTLE_ADDRESS=10.5.34.204 CATTLE_AGENT_CONNECT=true CATTLE_INTERNAL_ADDRESS= CATTLE_NODE_NAME=m-skl8s CATTLE_SERVER=https://rancher.acme.com CATTLE_TOKEN=REDACTED
INFO: Using resolv.conf: search acme.com nameserver 10.6.1.5 nameserver 10.6.1.6
INFO: https://rancher.acme.com/ping is accessible
INFO: rancher.acme.com resolves to 10.6.215.162
INFO: Value from https://rancher.acme.com/v3/settings/cacerts is an x509 certificate
time=“2021-05-05T18:04:55Z” level=info msg=“Listening on /tmp/log.sock”
time=“2021-05-05T18:04:55Z” level=info msg=“Rancher agent version v2.5.6 is starting”
time=“2021-05-05T18:04:55Z” level=info msg=“Option customConfig=map[address:10.5.34.204 internalAddress: label:map[] roles:[] taints:[]]”
time=“2021-05-05T18:04:55Z” level=info msg=“Option etcd=false”
time=“2021-05-05T18:04:55Z” level=info msg=“Option controlPlane=false”
time=“2021-05-05T18:04:55Z” level=info msg=“Option worker=false”
time=“2021-05-05T18:04:55Z” level=info msg=“Option requestedHostname=m-skl8s”
time=“2021-05-05T18:04:55Z” level=info msg=“Connecting to wss://rancher.acme.com/v3/connect with token wxxxxxxxxxxxxxxxxxxxd”
time=“2021-05-05T18:04:55Z” level=info msg=“Connecting to proxy” url=“wss://rancher.acme.com/v3/connect”
time=“2021-05-05T18:04:55Z” level=info msg=“Starting plan monitor, checking every 15 seconds”

I enabled debugging but it just shows the rancher-agent pinging. Is there a way to set loglevel debug not just at runtime but permanently so I can restart the container and see what might be wrong?

[root@mynewnode1 ~]# docker exec -it 06f23a8797e5 loglevel --set debug
OK
[root@mynewnode1 ~]# docker logs -f 06f23a8797e5
time=“2021-05-05T18:14:20Z” level=debug msg=“Wrote ping”
time=“2021-05-05T18:14:25Z” level=debug msg=“Wrote ping”
time=“2021-05-05T18:14:25Z” level=debug msg=“Get agent config: &rkeworker.NodeConfig{ClusterName:“c-np6ds”, Certs:”", Processes:map[string]types.Process(nil), Files:[]types.File(nil), NodeVersion:0, AgentCheckInterval:15}"
time=“2021-05-05T18:14:30Z” level=debug msg=“Wrote ping”
time=“2021-05-05T18:14:35Z” level=debug msg=“Wrote ping”
time=“2021-05-05T18:14:40Z” level=debug msg=“Wrote ping”
time=“2021-05-05T18:14:40Z” level=debug msg=“Get agent config: &rkeworker.NodeConfig{ClusterName:“c-np6ds”, Certs:”", Processes:map[string]types.Process(nil), Files:[]types.File(nil), NodeVersion:0, AgentCheckInterval:15}"

Does it eventually work if you leave it running longer than 2 minutes? Is the status of the cluster you are adding the worker to Ready? How many nodes with what roles are in the cluster? You could also be running into After Rancher upgrade new clusters are up but the nodes will be stuck at "waiting to register" · Issue #31999 · rancher/rancher · GitHub

I came back today and the nodes did finally successfully create. So it is working just takes hours.

How fortitious, i see that Rancher v2.5.8 officially came out 7 hours ago and #31999 is included in that. Let me try to upgrade and see if it fixes it.

Thanks!