Hello,
We are conduction disaster & recovery tests. Now we are testing the restoration of keys from a snapshot.
When checking the snapshots on the master nodes, I was surprised by the file size, it seemed small considering the size of our cluster:
root@rke-node-1:/var/lib/rancher/rke2/server/db/snapshots# ls -la
total 158888
drwx------ 2 root root 4096 Nov 22 12:00 .
drwx------ 4 root root 4096 May 22 2024 ..
-rw------- 1 root root 32534560 Nov 20 12:00 etcd-snapshot-rke-node-1-1732104004
-rw------- 1 root root 32534560 Nov 21 00:00 etcd-snapshot-rke-node-1-1732147202
-rw------- 1 root root 32534560 Nov 21 12:00 etcd-snapshot-rke-node-1-1732190402
-rw------- 1 root root 32534560 Nov 22 00:00 etcd-snapshot-rke-node-1-1732233603
-rw------- 1 root root 32534560 Nov 22 12:00 etcd-snapshot-rke-node-1-1732276805
This screenshot is from node 1, the file sizes on the other nodes are even smaller.
I tested restoring the etcd in a Docker container, and everything seems to have gone well. I used the following commands:
# Create docker container
docker run -d --name etcd-restoration -v $PWD:/snapshot:ro quay.io/coreos/etcd:v3.5.0 /usr/local/bin/etcd --data-dir=/etcd-data --name etcd0 --listen-peer-urls=http://0.0.0.0:2380 --listen-client-urls=http://0.0.0.0:2379 --advertise-client-urls=http://localhost:2379 --initial-cluster=etcd0=http://172.17.0.2:2380 --initial-cluster-token=etcd-cluster-1 --initial-cluster-state=new --enable-v2=true
# delete folder in docker container
rm -rf /etcd-data/*
# restore snapshot
etcdctl snapshot restore /snapshot/snapshot.db --data-dir /etcd-data
All commands were executed without error, but when searching for any key, nothing is returned:
root@a408eadb7c14:/# etcdctl get --keys-only --prefix=true /
root@a408eadb7c14:/#
The snapshot.db file was copied from the latest snapshot available on node 1.
Am I doing something wrong when searching for the keys, or should I be concerned that the snapshot is not functional?
My environment:
Provider: RKE2
Kubernetes Version: v1.26.10 +rke2r1
Thank you.