Looking for some guidance on diagnosing a control plane provisioning issue
Rancher version 2.6.5, k8s version v1.23.10-rancher1-1
Deploying a cluster through Cluster Management, with the VmSphere provider, RKE1, I have node pools for worker and control plane nodes
The control plane node is stuck “Waiting to register with Kubernetes”, this is the first control plane in the cluster
The controlplane VM provisions fine, and it starts, but the rancher agent on the control plan node is logging
INFO: Arguments: --server https://XXX.XXX.209.125 --token REDACTED --ca-checksum be3ae53c5b65e299b9b21ae7d757c97547d7357d438e585dfd426e2d493d3519 -r -n m-sv82n
INFO: Environment: CATTLE_ADDRESS=XXX.XXX.210.7 CATTLE_AGENT_CONNECT=true CATTLE_INTERNAL_ADDRESS= CATTLE_NODE_NAME=m-sv82n CATTLE_SERVER=https://XXX.XXX.209.125 CATTLE_TOKEN=REDACTED
INFO: Using resolv.conf: nameserver 127.0.0.53 options edns0 trust-ad search __REDACTED__
WARN: Loopback address found in /etc/resolv.conf, please refer to the documentation how to configure your cluster to resolve DNS properly
INFO: https://XXX.XXX.209.125/ping is accessible
INFO: Value from https://XXX.XXX.209.125/v3/settings/cacerts is an x509 certificate
time="2022-09-30T19:24:16Z" level=info msg="Listening on /tmp/log.sock"
time="2022-09-30T19:24:16Z" level=info msg="Rancher agent version v2.6.5 is starting"
time="2022-09-30T19:24:16Z" level=info msg="Option customConfig=map[address:XXX.XXX.210.7 internalAddress: label:map[] roles:[] taints:[]]"
time="2022-09-30T19:24:16Z" level=info msg="Option etcd=false"
time="2022-09-30T19:24:16Z" level=info msg="Option controlPlane=false"
time="2022-09-30T19:24:16Z" level=info msg="Option worker=false"
time="2022-09-30T19:24:16Z" level=info msg="Option requestedHostname=m-sv82n"
time="2022-09-30T19:24:16Z" level=info msg="Option dockerInfo={I2BX:HFAV:BKIO:TL7N:H6YI:7Q3G:5LHW:H4YJ:PXE3:KWBN:OXDN:NOEL 1 1 0 0 1 overlay2 [[Backing Filesystem extfs] [Supports d_type true] [Native Overlay Diff true] [userxattr false]] [] {[local] [bridge host ipvlan macvlan null overlay] [] [awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} true false true true true true true true true true true true false 32 true 40 2022-09-30T19:24:16.896364664Z json-file cgroupfs 1 0 5.4.0-105-generic Ubuntu 20.04.4 LTS 20.04 linux x86_64 https://index.docker.io/v1/ 0xc0011340e0 4 16786882560 [] /var/lib/docker eta2-k8s-m1 [provider=vmwarevsphere] false 20.10.18 map[io.containerd.runc.v2:{runc [] <nil>} io.containerd.runtime.v1.linux:{runc [] <nil>} runc:{runc [] <nil>}] runc { inactive false [] 0 0 <nil> []} false docker-init {9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6} {v1.1.4-0-g5fd4c4d v1.1. 4-0-g5fd4c4d} {de40ad0 de40ad0} [name=apparmor name=seccomp,profile=default] [] [WARNING: No swap limit support]}"
time="2022-09-30T19:24:16Z" level=info msg="Connecting to wss://XXX.XXX.209.125/v3/connect with token starting with 2lxpm6568f4bmmjhc9nh5lf6svk"
time="2022-09-30T19:24:16Z" level=info msg="Connecting to proxy" url="wss://XXX.XXX.209.125/v3/connect"
time="2022-09-30T19:24:16Z" level=info msg="Waiting for node to register. Either cluster is not ready for registering, cluster is currently provisioning, or etcd, controlplane and worker node have to be registered"
... last message just repeats
I don’t understand why the agent is logging Option etcd=false, and Option controlPlane=false, or if this is normal.
If I look at m-sb82n’s yaml in the rancher IU is has spec.controlPanel:true and spec.etcd: true
On the Rancher UI cluster page it shows “Provisioning” and “Waiting for etcd, controlplane, and worker nodes to be registered”
On the rancher side logs I can see all the logs the provisioning
…
2022/09/30 19:24:22 [INFO] Provisioning node eta2-k8s-m1 done
2022/09/30 19:24:22 [INFO] Generating and uploading node config eta2-k8s-m1
2022/09/30 19:24:22 [INFO] Handling backend connection request [c-gpp9t:m-sv82n]
2022/09/30 19:24:22 [DEBUG] could not get kubeconfig for cluster c-gpp9t
2022/09/30 19:24:22 [DEBUG] Found [m-sv82n] as node name in cluster [c-gpp9t], error: <nil>
I don’t know if the “could not get kubeconfig” is indicative of a problem or not.
Any direction on how to further diagnose would be much appreciated