Automatic etcd snapshots are failing

spirit.986 · February 10, 2020, 6:56am

It seems that my automatic etcd snapshots are failing as of 4 days ago.
The error message that I have is:

Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [192.168.36.75]

My rancher master is deployed in HA mode with three nodes. If I ssh into each of the three nodes I can successfully ping 192.168.36.75.
192.168.36.75 is the kube master of the rancher managed cluster where I would like to take the etcd snapshots. It is a one master kube cluster.

From the rancher dashboard I can see all of my clusters up/ready and all is green.

Thing is I am not sure how can I troubleshoot this further? Any help is appreciated.

spirit.986 · February 10, 2020, 11:29am

Update.

It seems that for some reason the rancher-agent container on the maser rancher/rancher-agent:v2.3.4 was unable to contact the kubelet container, hence from the logs:

Error response from daemon: {"message":"No such container: kubelet"}
Error: failed to start containers: kubelet
+ sleep 2
+ docker start kubelet
Error response from daemon: {"message":"No such container: kubelet"}
Error: failed to start containers: kubelet
+ sleep 2
+ docker start kubelet
Error response from daemon: {"message":"No such container: kubelet"}
Error: failed to start containers: kubelet
+ sleep 2
+ docker start kubelet
Error response from daemon: {"message":"No such container: kubelet"}
Error: failed to start containers: kubelet

But as I said, kubelet was present here, hence the cluster was wokring almost normally and it was all green on the Rancher UI. Since I wasn’t sure what to do and this was a one master cluster only, I rebooted the master. Now etcd snapshots are working again.

If anyone more experienced sees this please let me know for a more gentle solution on this problem.

Topic		Replies	Views
Cannot restore etcd snapshot Rancher	0	591	February 21, 2020
Cluster Snapshot failing (failed to connect to etcd host) attempting to connect to node address not in the cluster Rancher	2	1072	April 23, 2020
Rancher 2.0 with MicroK8s [Resolved] Rancher	2	5420	June 5, 2019
Rancher Restore KO	0	1299	March 28, 2019
ETCD backup restore fail, Rancher	5	2318	August 27, 2020

Automatic etcd snapshots are failing

Related topics