We have a small cluster setup with some nodes ( running longhorn 0.7.0 ) that have rather small disk resources.
The default longhorn setting for
Storage Minimal Available Percentage is 10%.
It seems the default trigger for a k8s node
DiskPressure signal is 15%.
Thus in the default setup it is likely that longhorn exhausts disk space of a node ( because there is still more than 10% of disk space avail ) and triggers
DiskPressure because node now has less than 15% of disk space avail.
Shouldn’t longhorn take into account the current level of free disk space that would trigger a
DiskPressure too to avoid longhorn getting evicted from current node ?
In such a scenario the replicated volumes get re-scheduled to other nodes, that again may trigger the
How can one avoid such chain-reaction ?
With such limited resources I already got two incidences where multiple volumes on nodes got lost because there was one node with DiskPressure in the cluster.