No such container kubelet

I just went through the renewal of certificates of the rancher web ui as we couldnt get onto it. We have successfully renewed the certificate on the web site of Rancher.

As part of the initial diagnostic process however we tried to restore from etcd snapshot as i recall at the time I could not see the cluster and we also renewed the kubernetes cluster certs (downstream cluster).
So we got this message

In the rancher-agent on the master /controlplane/etcd looking at the docker logs we are getting this error:

Found container ID: da9ea06c33d5524c6549722d3642f63c87d8bbbb8a65cf376caca4955df2bd41
Checking root: /host/run/runc
Checking file: 10c21c001787a05721086a9d74d37d47fa4a2b3af2d1652d349b632c22b2ed4f
Checking file: 25e12f5aac51ea7dd3267551dff3564edd7826b7cf0b9f695ae9c6ec1ffa5401
Checking file: 32edcd822506c94878fe7f1447a0b15252aa20ed9510d02260bdca98391eee0c
Checking file: 3561246e5027ce43f3c1ffe68f28bf105b24de8a4faf7130cfe8b00c7b08c50a
Checking file: 45e15c3a24c13b70a3411fbcd03fc533fe172683a4901d3637264ba254e3257d
Checking file: 4a3262ee02cc4830950d35460ebb537a98f55045a2d2ab11bfdb9b470920ee49
Checking file: 69d93f3f751640247620d829068cee6fd65cf4f04d4da921096804d8ee000a65
Checking file: 6e72931e9570fa4b6c7b45ea95b2041ff583e0a01339347c85920ab4e9255b48
Checking file: 7655e4dddbf787686fcf50658700024ab69703735d890e52d3121119c0a6245f
Checking file: 897d61f703addb0ff846ca06da94c466a7112e6e6f23291863fc0f756a45225b
Checking file: 9fedfe5c4435b8465d37e2a83e8a396eabfbbfdd371087231b28b23ae6e7023a
Checking file: a3c87f735626c7263983cd5da17b661ddf377cb403d89003f1168dd039de467f
Checking file: aa51f130d391ea1ce89f0e8eca0b457ddba2f768009b6138d2127706ad186162
Checking file: b7b155142d41cb21b9c4e806a861ce615ec891b546ec3e543718319408150a46
Checking file: c8d8954264273c069e008225e0c5d64e7dc2f29f5183096ef9a8df3fc972e807
Checking file: cd9e7d5c717a34516f863c6c019dfa29deac6455cfacfc7e661ac148efa6911e
Checking file: da9ea06c33d5524c6549722d3642f63c87d8bbbb8a65cf376caca4955df2bd41
Found state.json: da9ea06c33d5524c6549722d3642f63c87d8bbbb8a65cf376caca4955df2bd41
time="2021-11-01T22:21:32Z" level=info msg="Execing [/usr/bin/nsenter --mount=/proc/56751/ns/mnt -F -- /data/var.lib.docker/overlay2/918631e37526c96f1583a6c68b333dfdc5d43497ef746aa7d5fbd00e78d775ee/merged/usr/bin/share-mnt --stage2 /var/lib/kubelet /var/lib/rancher -- norun]"
time="2021-11-02T11:21:32+13:00" level=info msg="Root Directory is shared, skipping stage2"
+ docker start kubelet
Error response from daemon: {"message":"No such container: kubelet"}
Error: failed to start containers: kubelet
+ sleep 2
+ docker start kubelet
Error response from daemon: {"message":"No such container: kubelet"}
Error: failed to start containers: kubelet
+ sleep 2
+ docker start kubelet

I am not quite sure why it is not starting up.

I can still see the nodes status. in the rancher ui.

Any help would be greatly appreciated. I hope that this will be sufficient. Am happy to provide more. Thanking you.

Another key piece of information is that this is running 2.4.8. We will be upgrading it, I have etcd snapshots.

Recently we have restarted all nodes in the downstream cluster…when we check the logs we see that there are 3 rancher agents on the one node that is suppose to be the etcd /controlplane.

Is there any documentation or guidance as to what i can do? Is there any extra information that is needed to get any help? It is much appreciated.

The agent is waiting for kubelet to be created so it can be started, but that won’t happen if the cluster is in Updating state. The reason why the cluster got in that state is the start of this, from there you can fix that state and then provisioning of nodes can resume.

Please describe what you executed exactly on your setup and what error you got after renewing the certificate, renewing the Rancher certificate does not renew cluster’s certificate, there is a separate process for that (Rancher Docs: Certificate Rotation)

So I guess you tried restoring an etcd snapshot, how old was that snapshot? And what are the logs from Rancher from starting that attempt til the error?

Thank you for the reply.

We managed to get the docker containers up and running again. We had to renew the rancher ui certificate and then restored from backup etcd. This got the containers up and running again. We still have some issues with the logs. I can raise this as another post, if we have no joy with it. Thank you once again for the reply.