Restore etcd quorum after data center outage

Andrii · October 17, 2019, 10:18am

Hi,

Assuming there is a 6-nodes k8s cluster spanning across 2 data centers, 3 nodes in each. Both DCs has some amount of etcd and control plane nodes. I would like to know how to restore the cluster functionally if one of the DCs fails permanently. In this scenario nodes in remaining DC will continue functioning, but etcd will be in an unhealthy state because it won’t be able to form the quorum. How to restore quorum in the remaining DC and have fully-functional cluster? Rancher itself is deployed to some other third DC (cloud) and is not affected by the outage.

Thanks,
Andrii

Andrii · October 29, 2019, 12:41pm

After I figured out how to do it by removing existing etcd container and staring new one with same configuration and --force-new-cluster flag, I found this article that explains same approach https://blog.devget.net/devops/rancher-etcd-disaster-recovery/.

Topic		Replies	Views
ETCD persistence Rancher	3	1188	June 14, 2019
No redundancy of etcd/control plane in 5-node kubernetes cluster Rancher	9	1793	February 22, 2019
Cannot restore etcd snapshot Rancher	0	593	February 21, 2020
How can I restore the cluster where fail to add nodes? Rancher	3	567	November 21, 2018
Restore deleted custom cluster from etcd backup Rancher	1	639	December 21, 2020

Restore etcd quorum after data center outage

Related topics