Hi,
I was trying to restore my ETCD from a snapshot from Rancher’s UI but got the problem described here: https://github.com/rancher/rancher/issues/25410
So the cluster is running, but I keep getting the message “Etcd snapshots are not consistent” and the snapshot was not recovered (Some deployments I made after taking the snapshot were still there)
So I decided to recover the snapshot “the hardway”:
1 - I removed the extra etcd nodes I had, keeping just the master
2 - Executed steps 2 and 3 from https://rancher.com/docs/rancher/v2.x/en/cluster-admin/restoring-etcd/#recovering-etcd-without-a-snapshot
3 - Checked here how to recove the snapshot: https://etcd.io/docs/v3.3.12/op-guide/recovery/
4 - I got : Error: data-dir “/var/lib/rancher/etcd/” exists so I changed the data-dir to “/var/lib/rancher/etcd/restore” so I can latter move it to the right place
5 - I still got an Error: data-dir “/var/lib/rancher/etcd/restore” exists but at this time it creates the “member” folder so I move it back to the “var/lib/rancher/etcd/” folder
6 - When I start the etcd container I got the log: “2020-09-03 20:20:52.563645 E | rafthttp: request cluster ID mismatch (got 8d36ab15f1e62ff1 want 8d64bcd84117d00a)”
The command I used to restore the snapshot was:
docker run --name=etcd-restore --hostname=rch-dev-etcd-01 --env=ETCDCTL_API=3 --env=ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem --env=ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-10-100-192-48.pem --env=ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-10-100-192-48-key.pem --env=ETCDCTL_ENDPOINTS=https://127.0.0.1:2379 --env=ETCD_UNSUPPORTED_ARCH=x86_64 --env=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin --volume=/var/lib/etcd:/var/lib/rancher/etcd/:z --volume=/etc/kubernetes:/etc/kubernetes:z --network=host --volume=/opt/rke/etcd-snapshots/:/opt/rke/etcd-snapshots/:z --expose=2379/tcp --expose=2380/tcp --restart=always --label=‘io.rancher.rke.container.name=etcd’ --log-opt max-size=50m --log-opt max-file=3 --detach=true rancher/coreos-etcd:v3.4.3-rancher1 etcdctl snapshot restore /opt/rke/etcd-snapshots/c-dswpj-rl-ctvgk_2020-09-02T21:17:37Z --initial-cluster=etcd-rch-dev-etcd-01=https://<HOST_LOCAL_IP>:2380 --data-dir=/var/lib/rancher/etcd/restore --initial-cluster-token=etcd-cluster-1 --name=etcd-rch-dev-etcd-01 --initial-advertise-peer-urls=https://<HOST_LOCAL_IP>:2380
As I used the step 2 above to create the restore command, there are some unecessary ENVs
My question is: Is there a way to recover the etcd from this snapshot?
Best regards
Paulo Leal