Longhorn split-brain quorum

Hey community,

how does longhorn deal with split-brain at storage level?

The following scenario:
a 3 node K8s cluster (1x ControlPlane and 2x ControlPlane + Worker Nodes combined)

The ControlPlane-only runs on a less powerful vServer and is only there to form the Quroum (and some system pods like storage-controller etc… in case of a failure).

The two combined nodes each run on a dedicated server with plenty of RAM, CPU and SSD memory. These should handle the workload.

If all 3 nodes are running, Longhorn can be operated with replica count “2” without any problems. When the ControlPlane-Only-Node goes offline, no issue.

But how does Longhorn handle the storage if one of the two storage nodes (dedicated server) is NotReady / crashed / offline?

Can Longhorn deal with the fact that there is only one replica left and since the remaining nodes bring the pods back up for storage management, does everything work as usual?
When the Offline node comes back, will the replica be rebuilt?

Or is there a split brain here and the data is lost?

Unfortunately, I have not been able to find anything about handling such a split-brain situation (at storage level). Neither in the documentation of longhorn nor in the forum.

Happy to hear your experiences with that kind of setup. :slight_smile:

Thanks & Greets! frran24

Yes, that is how longhorn works. The volumes will be in a degraded state, but still usable and once the other node is back up, the volumes will get gradullay synced. In your case ( limited storage nodes) make sure to set “Allow Volume Creation with Degraded Availability” in the longhorn settings to true, otherwise the pods would not be able to start on the remaining node.