It happened again! Our Rancher 2 deployment (Local cluster and a user workload cluster) went down, and the root cause was that our three
etcd nodes couldn’t talk to each other due to a network configuration issue. Unfortunately, since etcd was down, Rancher itself was down and thus the Rancher alerting system was not functional.
What strategies have you employed to keep an eye on your etcd cluster in the event that Rancher itself is down? Do you use