Restore Snapshot - etcd nodes are unavailable and calico-node not running

Steps to reproduce (least amount of steps as possible):
In Rancher UI, select Restore Snapshot

Result:

  1. etcd nodes (some or all 3 nodes) state is UNAVAILABLE with the “Run-time network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized” error. This behavior does not happen all the time after a snapshot restore.
  2. etcd nodes state is AVAILABLE but calico-node pod (all pods) are not running. Hence, application is not running either.

Other details that may be helpful:
When the calico-node pods are not Running, redeploy calico-node workload seems to solve it. However, the pod(s) can also go in/out of Running state. Ultimately, I have to deploy the calico-node workload and recreate the calico-node daemonset.

Sometimes, the cattle-cluster-agent, coreden, coredns-autoscaler, and metric-server workloads show Active status but the pod(s) are not running. I’m not sure if they are impacted by calico-node not working.

Environment information

  • Rancher version ( rancher/rancher / rancher/server image tag or shown bottom left in the UI):
    v2.2.6
  • Installation option (single install/HA):
    HA AirGap

Cluster information

  • Cluster type (Hosted/Infrastructure Provider/Custom/Imported):
    Import (initializing local rancher cluster)
  • Machine type (cloud/VM/metal) and specifications (CPU/memory):
    t2.large AWS ec2-instance
  • Kubernetes version (use kubectl version ):
    v1.14.3
  • Docker version (use docker version ):
    1.13.1