Hi all!
I am having issues with my Kube cluster setup with rke and managed by Rancher (2.1.3) when one node gets disk pressure.
Currently using 4 nodes (4vCPU, 120GB disk and 12GB RAM) for a PoC of a future-productive workload. OS is Linux Ubuntu 18.04.
I am hosting the Block-storage Persistent Volumes as Containers on the same hosts as the nodes with OpenEBS Storage driver.
Once a node reaches 90% disk usage, it starts draining all its pods, and all the pods start moving to the remaining nodes. This leads the problem, that the other nodes need to host the disk-pressured-one’s disk images, leading to a cascade of running their file-systems full, becoming disk-pressured, too, and eventually the whole cluster breaks.
Have you experienced such problems and how to recover from there?
Kubernetes should also run some Garbage Collection on old Docker Images Kubelet FS, etc. how can I check this?
Also in case of OpenEBS (it’s based on Longhorn) the (test) disk-images are size limited to be, e.g., 10GB but in some cases utilize 18GB on disk due to snapshots.
I would appreciate any help.
Merry XMas
Damian