Rancher UI goes down inconsistency , I'm following single node rancher deployment

ajith_nirmal · July 5, 2024, 9:52am

Hi Rancher Community,

I’m facing an issue with my Rancher setup where the etcd container on the Rancher master node is restarting frequently. Here are the details:

Setup: I have a Rancher master node with RKE1 configured. The cluster includes a master, etcd on one node, and a worker on another node.
Issue: The etcd container restarts intermittently, causing downtime in the Rancher UI.
Observations: The logs show the following error:

{"level":"info","ts":"2024-07-05T09:46:58.250044Z","caller":"embed/etcd.go:135","msg":"configuring client listeners","listen-client-urls":["https://0.0.0.0:2379"]}
{"level":"info","ts":"2024-07-05T09:46:58.250348Z","caller":"embed/etcd.go:309","msg":"starting an etcd server","etcd-version":"3.5.10","git-sha":"0223ca52b","go-version":"go1.20.10","go-os":"linux","go-arch":"amd64","max-cpu-set":6,"max-cpu-available":6,"member-initialized":true,"name":"etcd-172.27.16.68","data-dir":"/var/lib/rancher/etcd/","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/rancher/etcd/member","force-new-cluster":false,"heartbeat-interval":"500ms","election-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":100000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://172.27.16.68:2380"],"listen-peer-urls":["https://0.0.0.0:2380"],"advertise-client-urls":["https://172.27.16.68:2379"],"listen-client-urls":["https://0.0.0.0:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
panic: freepages: failed to get all reachable pages (key[2]=(hex)00000000002d95f55f0000000000000000 on leaf page(6549) needs to be < than key of the next element in ancestor (hex)00000000002d95da5f0000000000000000. Pages stack: [4569 6313 6549])

goroutine 108 [running]:
go.etcd.io/bbolt.(*DB).freepages.func2()
go.etcd.io/bbolt@v1.3.8/db.go:1202 +0x99
created by go.etcd.io/bbolt.(*DB).freepages
go.etcd.io/bbolt@v1.3.8/db.go:1200 +0x1ea

Actions Taken: I have ensured sufficient disk space
Any insights or recommendations on how to stabilize etcd and prevent these frequent restarts would be greatly appreciated.

Thank you!

Topic		Replies	Views
Single Node Rke Rancher inconsistency goes down	5	376	July 5, 2024
Etcd keeps crashing Rancher 2.0 Tech Preview	3	2562	April 17, 2018
Rancher 2.5 etcd container lost Rancher	0	415	March 31, 2023
SH is booting us out because boot loop Rancher	2	520	April 3, 2021
Schedular and controller restarts frequently - rancher rke Rancher	3	1213	December 7, 2020

Rancher UI goes down inconsistency , I'm following single node rancher deployment

Related topics