Unable to add node to vsphere cluster

Hi,

I am trying to add a node to our vsphere-based cluster (built using the vsphere infrastructure provider).

The added node is stuck in “Waiting to register with Kubernetes” and rancher is showing an error message (“c-n5fxz” is the clusterid of the vsphere cluster):

Don’t know about the additional “c-” in the secret name. Is that on purpose?

How can i fix this?

Best,
Philipp

The secret is prefixed with c- yes, we need some more info on this. What Rancher setup (single/HA), what version, how many clusters, how many nodes, did you check if the secret actually exists in the Rancher setup (local cluster if HA/Docker container if single), when did it last work? What is the rancher/rancher log? What about the debug/trace log?

Currently we are running rancher 2.5.7 in a 2-Master+MySQL setup.
The Cluster was created by some 2.4.x-Version a year ago. Don’t know exactly when we were scaling successfully for the last time.

The vSphere-Cluster has 7 nodes. 3 CP + 4 Workers.

Obviously, there is no secret by that name stored in the control-cluster. i don’t know if it ever existed and how it might have been lost.
When I create a new cluster, a secret for that cluster is created.

Here’s the output from the rancher log:

2021/06/10 17:16:59 [INFO] Provisioning cluster [c-n5fxz]
2021/06/10 17:16:59 [INFO] Updating cluster [c-n5fxz]
2021/06/10 17:17:04 [INFO] kontainerdriver rancherkubernetesengine listening on address 127.0.0.1:45595
2021/06/10 17:17:04 [INFO] kontainerdriver rancherkubernetesengine stopped
2021/06/10 17:17:05 [ERROR] error syncing 'c-n5fxz': handler cluster-provisioner-controller: secrets "cattle-system/c-c-n5fxz" not found, handler certificate-expiration: secrets "cattle-system/c-c-n5fxz" not found, requeuing

So, is there a way to recreate it?

I have to look into recreating the entire secret, do you have a snapshot from the local cluster?

I already tried recreating the cluster from the oldest snapshot we have :frowning:

As a fallback, I thought about re-importing the cluster, though this will probably result in a new cluster-id which I’d like to avoid.

In your case it would be MySQL backups (Rancher Docs: Backing up Rancher Installed on a K3s Kubernetes Cluster), do you have those and did you find the secret in one of them?

Actually I found a very old mysql-backup containing the key. I was able to add it as a secret and the problems resolved.
Still now idea, why the secret vanished in the first place.

I can look into it but without any timelines or events (with supporting logs) that help narrow it down, it’s going to be tough.