ETCD Restore from a snapshot

paulo.leal · September 3, 2020, 8:49pm

Hi,

I was trying to restore my ETCD from a snapshot from Rancher’s UI but got the problem described here: https://github.com/rancher/rancher/issues/25410
So the cluster is running, but I keep getting the message “Etcd snapshots are not consistent” and the snapshot was not recovered (Some deployments I made after taking the snapshot were still there)
So I decided to recover the snapshot “the hardway”:
1 - I removed the extra etcd nodes I had, keeping just the master
2 - Executed steps 2 and 3 from https://rancher.com/docs/rancher/v2.x/en/cluster-admin/restoring-etcd/#recovering-etcd-without-a-snapshot
3 - Checked here how to recove the snapshot: https://etcd.io/docs/v3.3.12/op-guide/recovery/
4 - I got : Error: data-dir “/var/lib/rancher/etcd/” exists so I changed the data-dir to “/var/lib/rancher/etcd/restore” so I can latter move it to the right place
5 - I still got an Error: data-dir “/var/lib/rancher/etcd/restore” exists but at this time it creates the “member” folder so I move it back to the “var/lib/rancher/etcd/” folder
6 - When I start the etcd container I got the log: “2020-09-03 20:20:52.563645 E | rafthttp: request cluster ID mismatch (got 8d36ab15f1e62ff1 want 8d64bcd84117d00a)”

The command I used to restore the snapshot was:
docker run --name=etcd-restore --hostname=rch-dev-etcd-01 --env=ETCDCTL_API=3 --env=ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem --env=ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-10-100-192-48.pem --env=ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-10-100-192-48-key.pem --env=ETCDCTL_ENDPOINTS=https://127.0.0.1:2379 --env=ETCD_UNSUPPORTED_ARCH=x86_64 --env=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin --volume=/var/lib/etcd:/var/lib/rancher/etcd/:z --volume=/etc/kubernetes:/etc/kubernetes:z --network=host --volume=/opt/rke/etcd-snapshots/:/opt/rke/etcd-snapshots/:z --expose=2379/tcp --expose=2380/tcp --restart=always --label=‘io.rancher.rke.container.name=etcd’ --log-opt max-size=50m --log-opt max-file=3 --detach=true rancher/coreos-etcd:v3.4.3-rancher1 etcdctl snapshot restore /opt/rke/etcd-snapshots/c-dswpj-rl-ctvgk_2020-09-02T21:17:37Z --initial-cluster=etcd-rch-dev-etcd-01=https://<HOST_LOCAL_IP>:2380 --data-dir=/var/lib/rancher/etcd/restore --initial-cluster-token=etcd-cluster-1 --name=etcd-rch-dev-etcd-01 --initial-advertise-peer-urls=https://<HOST_LOCAL_IP>:2380

As I used the step 2 above to create the restore command, there are some unecessary ENVs

My question is: Is there a way to recover the etcd from this snapshot?

Best regards

Paulo Leal

Topic		Replies	Views
Restore a cluster from etcd snapshot Rancher	3	1713	April 12, 2019
Cannot restore etcd snapshot Rancher	0	591	February 21, 2020
3 node etcd cluster recovery from snapshot Rancher	6	1230	June 26, 2020
Etcd snapshots are not consistant Rancher	0	1020	May 21, 2020
Restore Snapshot :- unable to restore etcd snapshot using rke Rancher	0	915	March 20, 2020

ETCD Restore from a snapshot

Related topics