Disk Pressure breaks cluster

damlub · December 17, 2018, 3:12pm

Hi all!

I am having issues with my Kube cluster setup with rke and managed by Rancher (2.1.3) when one node gets disk pressure.

Currently using 4 nodes (4vCPU, 120GB disk and 12GB RAM) for a PoC of a future-productive workload. OS is Linux Ubuntu 18.04.

I am hosting the Block-storage Persistent Volumes as Containers on the same hosts as the nodes with OpenEBS Storage driver.

Once a node reaches 90% disk usage, it starts draining all its pods, and all the pods start moving to the remaining nodes. This leads the problem, that the other nodes need to host the disk-pressured-one’s disk images, leading to a cascade of running their file-systems full, becoming disk-pressured, too, and eventually the whole cluster breaks.

Have you experienced such problems and how to recover from there?
Kubernetes should also run some Garbage Collection on old Docker Images Kubelet FS, etc. how can I check this?

Also in case of OpenEBS (it’s based on Longhorn) the (test) disk-images are size limited to be, e.g., 10GB but in some cases utilize 18GB on disk due to snapshots.

I would appreciate any help.

Merry XMas
Damian

vincent · December 17, 2018, 4:32pm

You can adjust the threshold if you really want, but the cascading failure is ultimately because you have a workload that requires “n” nodes and less than “n+1” of them in the cluster.

If one node dies for any reason, you’re going to end up not being able to service of all the workload the cluster is supposed to be running. The disk pressure threshold is just causing it at 90% when things can be shut down gracefully, instead of 100% and crashing. Or a Coke being spilled into one node.

Topic		Replies	Views
[Urgent] Disk pressure issue with >500Gb available Rancher	4	3262	October 16, 2020
Cluster rke : Kubelet has disk pressure Rancher	2	16631	September 5, 2019
Disk pressure issue on Control Pane and etcd node Rancher	3	5570	November 28, 2018
Disk pressure on Rancher 2.3.3 - nginx-ingress-controller Rancher	0	823	December 18, 2019
node.kubernetes.io/disk-pressure:NoSchedule Rancher	0	617	August 19, 2020

Disk Pressure breaks cluster

Related topics