Docker allows you to view logs for a stopped or exited container. Do docker ps -a | grep etcd to look for the stopped etcd container, and then look at the logs:
Also, for crashed containers, you may be able to find old logs under /var/lib/docker/containers/<container id>/<container id>-json.log. Look for similar files named *-json.log-1, *-json.log-2, etc.
It turns out all the containers have exited with no obvious cause, and there is no etcd.
Thinking there must have been a setup problem with this node, I am trying to run “rke up” to set it up again, but it fails to ssh to the problem node - ranchm01. This is also the node I am running rke up from:
[root@ranchm01 ~]# rke up
INFO[0000] Running RKE version: v1.1.2
INFO[0000] Initiating Kubernetes cluster
INFO[0000] [certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates
INFO[0000] [certificates] Generating admin certificates and kubeconfig
INFO[0000] Successfully Deployed state file at [./cluster.rkestate]
INFO[0000] Building Kubernetes cluster
INFO[0000] [dialer] Setup tunnel for host [ranchm03]
INFO[0000] [dialer] Setup tunnel for host [ranchm02]
INFO[0000] [dialer] Setup tunnel for host [ranchm01]
WARN[0000] Failed to set up SSH tunneling for host [ranchm01]: Can’t retrieve Docker Info: error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info: Unable to access node with address [ranchm01:22] using SSH. Please check if the configured key or specified key file is a valid SSH Private Key. Error: Error configuring SSH: ssh: no key found
WARN[0000] Removing host [ranchm01] from node lists
I have confirmed I can do a passwordless login from ranchm01 to ranchm01 as the rancher user by this:
[root@ranchm01 ~]# ssh -i /home/rancher/.ssh/id_rsa rancher@ranchm01 docker ps