Hey community,
how does longhorn deal with split-brain at storage level?
The following scenario:
a 3 node K8s cluster (1x ControlPlane and 2x ControlPlane + Worker Nodes combined)
The ControlPlane-only runs on a less powerful vServer and is only there to form the Quroum (and some system pods like storage-controller etc… in case of a failure).
The two combined nodes each run on a dedicated server with plenty of RAM, CPU and SSD memory. These should handle the workload.
If all 3 nodes are running, Longhorn can be operated with replica count “2” without any problems. When the ControlPlane-Only-Node goes offline, no issue.
But how does Longhorn handle the storage if one of the two storage nodes (dedicated server) is NotReady / crashed / offline?
Can Longhorn deal with the fact that there is only one replica left and since the remaining nodes bring the pods back up for storage management, does everything work as usual?
When the Offline node comes back, will the replica be rebuilt?
Or is there a split brain here and the data is lost?
Unfortunately, I have not been able to find anything about handling such a split-brain situation (at storage level). Neither in the documentation of longhorn nor in the forum.
Happy to hear your experiences with that kind of setup.
Thanks & Greets! frran24