Hi Rancher Community,
I’m facing an issue with my Rancher setup where the etcd container on the Rancher master node is restarting frequently. Here are the details:
- Setup: I have a Rancher master node with RKE1 configured. The cluster includes a master, etcd on one node, and a worker on another node.
- Issue: The etcd container restarts intermittently, causing downtime in the Rancher UI.
- Observations: The logs show the following error:
{"level":"info","ts":"2024-07-05T09:46:58.250044Z","caller":"embed/etcd.go:135","msg":"configuring client listeners","listen-client-urls":["https://0.0.0.0:2379"]}
{"level":"info","ts":"2024-07-05T09:46:58.250348Z","caller":"embed/etcd.go:309","msg":"starting an etcd server","etcd-version":"3.5.10","git-sha":"0223ca52b","go-version":"go1.20.10","go-os":"linux","go-arch":"amd64","max-cpu-set":6,"max-cpu-available":6,"member-initialized":true,"name":"etcd-172.27.16.68","data-dir":"/var/lib/rancher/etcd/","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/rancher/etcd/member","force-new-cluster":false,"heartbeat-interval":"500ms","election-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":100000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://172.27.16.68:2380"],"listen-peer-urls":["https://0.0.0.0:2380"],"advertise-client-urls":["https://172.27.16.68:2379"],"listen-client-urls":["https://0.0.0.0:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
panic: freepages: failed to get all reachable pages (key[2]=(hex)00000000002d95f55f0000000000000000 on leaf page(6549) needs to be < than key of the next element in ancestor (hex)00000000002d95da5f0000000000000000. Pages stack: [4569 6313 6549])
goroutine 108 [running]:
go.etcd.io/bbolt.(*DB).freepages.func2()
go.etcd.io/bbolt@v1.3.8/db.go:1202 +0x99
created by go.etcd.io/bbolt.(*DB).freepages
go.etcd.io/bbolt@v1.3.8/db.go:1200 +0x1ea
- Actions Taken: I have ensured sufficient disk space
Any insights or recommendations on how to stabilize etcd and prevent these frequent restarts would be greatly appreciated.
Thank you!