It seems that my automatic etcd snapshots are failing as of 4 days ago.
The error message that I have is:
Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) [192.168.36.75]
My rancher master is deployed in HA mode with three nodes. If I ssh into each of the three nodes I can successfully ping 192.168.36.75.
192.168.36.75 is the kube master of the rancher managed cluster where I would like to take the etcd snapshots. It is a one master kube cluster.
From the rancher dashboard I can see all of my clusters up/ready and all is green.
Thing is I am not sure how can I troubleshoot this further? Any help is appreciated.
Update.
It seems that for some reason the rancher-agent container on the maser rancher/rancher-agent:v2.3.4 was unable to contact the kubelet container, hence from the logs:
Error response from daemon: {"message":"No such container: kubelet"}
Error: failed to start containers: kubelet
+ sleep 2
+ docker start kubelet
Error response from daemon: {"message":"No such container: kubelet"}
Error: failed to start containers: kubelet
+ sleep 2
+ docker start kubelet
Error response from daemon: {"message":"No such container: kubelet"}
Error: failed to start containers: kubelet
+ sleep 2
+ docker start kubelet
Error response from daemon: {"message":"No such container: kubelet"}
Error: failed to start containers: kubelet
But as I said, kubelet was present here, hence the cluster was wokring almost normally and it was all green on the Rancher UI. Since I wasn’t sure what to do and this was a one master cluster only, I rebooted the master. Now etcd snapshots are working again.
If anyone more experienced sees this please let me know for a more gentle solution on this problem.