Dealing with DiskPressure

manuel-koch · February 24, 2020, 3:18pm

We have a small cluster setup with some nodes ( running longhorn 0.7.0 ) that have rather small disk resources.
The default longhorn setting for Storage Minimal Available Percentage is 10%.
It seems the default trigger for a k8s node DiskPressure signal is 15%.

Thus in the default setup it is likely that longhorn exhausts disk space of a node ( because there is still more than 10% of disk space avail ) and triggers DiskPressure because node now has less than 15% of disk space avail.

Shouldn’t longhorn take into account the current level of free disk space that would trigger a DiskPressure too to avoid longhorn getting evicted from current node ?
In such a scenario the replicated volumes get re-scheduled to other nodes, that again may trigger the DiskPressure problem.

How can one avoid such chain-reaction ?

With such limited resources I already got two incidences where multiple volumes on nodes got lost because there was one node with DiskPressure in the cluster.

yasker · February 24, 2020, 5:19pm

Yeah, currently we didn’t take the available space into the consideration of the DiskPressure (though we did take that into consideration when scheduling a replica). We should do that. Can you help to file a github issue for it?

One thing might help in the currently release to reduce the OverProvisioningPercentage to e.g. 100% or even less to stop Longhorn from overprovisioning the volume.

You can also add dedicated disks to Longhorn to prevent competing with system and Kubernetes processes.

manuel-koch · February 25, 2020, 8:23am

I have already reduced OverProvisioningPercentage to 125% but soon afterwards the same DiskPressure issue happened again.
My current setup is StorageMinimalAvailablePercentage = 20% and OverProvisioningPercentage = 100%.

I will try to attach an additional disk for longhorn to some of my nodes to improve stability.

I created ticket https://github.com/longhorn/longhorn/issues/1068

manuel-koch · February 25, 2020, 10:36am

I tried your suggestion in adding disks to node(s) and stumbled over an issue ( or maybe it was just my fault setting it up properly ), see https://github.com/longhorn/longhorn/issues/1069.

Topic		Replies	Views
Disk Pressure breaks cluster Rancher	1	4205	December 17, 2018
[Urgent] Disk pressure issue with >500Gb available Rancher	4	3262	October 16, 2020
Disk pressure issue on Control Pane and etcd node Rancher	3	5570	November 28, 2018
Cluster rke : Kubelet has disk pressure Rancher	2	16629	September 5, 2019
Longhorn wrong node disk size Longhorn	1	658	January 4, 2023

Dealing with DiskPressure

Related topics