Longhorn split-brain quorum

frran24 · July 28, 2024, 9:57am

Hey community,

how does longhorn deal with split-brain at storage level?

The following scenario:
a 3 node K8s cluster (1x ControlPlane and 2x ControlPlane + Worker Nodes combined)

The ControlPlane-only runs on a less powerful vServer and is only there to form the Quroum (and some system pods like storage-controller etc… in case of a failure).

The two combined nodes each run on a dedicated server with plenty of RAM, CPU and SSD memory. These should handle the workload.

If all 3 nodes are running, Longhorn can be operated with replica count “2” without any problems. When the ControlPlane-Only-Node goes offline, no issue.

But how does Longhorn handle the storage if one of the two storage nodes (dedicated server) is NotReady / crashed / offline?

Can Longhorn deal with the fact that there is only one replica left and since the remaining nodes bring the pods back up for storage management, does everything work as usual?
When the Offline node comes back, will the replica be rebuilt?

Or is there a split brain here and the data is lost?

Unfortunately, I have not been able to find anything about handling such a split-brain situation (at storage level). Neither in the documentation of longhorn nor in the forum.

Happy to hear your experiences with that kind of setup.

Thanks & Greets! frran24

bpedersen2 · July 29, 2024, 9:31am

Yes, that is how longhorn works. The volumes will be in a degraded state, but still usable and once the other node is back up, the volumes will get gradullay synced. In your case ( limited storage nodes) make sure to set “Allow Volume Creation with Degraded Availability” in the longhorn settings to true, otherwise the pods would not be able to start on the remaining node.

Topic		Replies	Views
Dedicated nodes for longhorn disks Longhorn	2	5193	May 18, 2021
Where my storage goes in Longhorn? Longhorn	7	12444	July 6, 2022
Harvester host redundant disk configuration Harvester	5	1643	October 22, 2024
General questions about Longhorn and scale Longhorn	20	7145	February 24, 2024
HA storage: deployment replica vs StorageClass replica Longhorn	0	545	September 19, 2023

Longhorn split-brain quorum

Related topics