How to cancel restore from snapshot?

bofh666 · February 12, 2020, 6:13pm

Hello community,

we have RKE cluster of 3 nodes with all roles provisioned via vSphere cloud provider.

The cluster is for development, so all nodes live on single physical host.

Today we decided to upgrade vSphere on the host so needed to shut all VMs down including k8s cluster nodes.

We powered up Rancher and cluster nodes, but the cluster didn’t came up succesfully. After some investigation it appeared that network issue took place: DHCP lease time was 2 minutes (for specific reason which is out of scope of the topic), so IP addresses of nodes got messed up and there was no connectivity because SSL certificates of services on each node don’t have SANs for IPs on adjacent nodes (if we understood correctly reading logs).

But before we realized the reason of cluster instability we activated cluster restore from latest snapshot. After we fixed network issue the cluster became almost heathy, but it looks like etcd containers are affected by initiated restore and now we have etcd-Serve-backup and etcd-download-backup containers spawning over and over again on the nodes and Updating status of the cluster with alert saying that one of etcd component is unhealthy in web UI.

Please suggest correct way to stop restore process and advise if it will help to get all etcd instances healthy. Or will it be easier/faster to just recreate the cluster?

TIA

Topic		Replies	Views
Restore a cluster from etcd snapshot Rancher	3	1713	April 12, 2019
Etcd snapshots are not consistant Rancher	0	1020	May 21, 2020
Restore Snapshot :- unable to restore etcd snapshot using rke Rancher	0	915	March 20, 2020
Cannot restore etcd snapshot Rancher	0	591	February 21, 2020
How can I restore the cluster where fail to add nodes? Rancher	3	564	November 21, 2018

How to cancel restore from snapshot?

Related topics