Du is causing rancherOS to hang

JD_Trout · December 13, 2016, 5:28am

I am not sure if Rancher or RancherOS is the culprit, but I have been experiencing issues where my RancherOS hosts average load time spikes when there is no external reason for it. I was able to do some basic investigating - as busybox does not provide many tools - and top showed a handful of du -s /var/lib/docker/overlay/xxxxx command with the STAT of D, which means Uninterruptible sleep (usually IO). This is usually a bad thing. I killed the du processes and the load average drop below one and and the host started to act as it should. What is triggering the du against the container volumes and more importantly why is it hanging?

As you can imagine having this issue in a production environment can be quite troubling as hosts for no real reasons can become unresponsive.

The RancherOS version is 0.7.1 and the Rancher version is 1.1.0.

Thanks for any insight and guidance.

vincent · December 13, 2016, 5:46am

cAdvisor provides the data for the graphs you see in the UI in Rancher < 1.2. It runs even when you’re not looking at the graphs and forks du every refresh interval.

In 1.2+ we use docker’s built-in stats, gather them only while you’re looking, and it doesn’t call du.

JD_Trout · December 13, 2016, 6:20am

Thanks @vincent! Sounds like this problem is solved in 1.2 then? Out of curousity any thoughts on what could be causing du to hang?

vincent · December 13, 2016, 6:34am

Probably some external event causes one to take slightly longer than the interval (5 sec) to complete, and then they start stacking up.

JD_Trout · December 13, 2016, 6:19pm

The issue is it never finishes and craters the host. I don’t suppose there is a way to configure cAdvise not to run? System stability is more critical than the performance graphs and I have other tools providing some of that information. I guess another option would be to write a script to look for processes that have a state of Uninterruptible sleep and kill them?

Topic		Replies	Views
User Docker Hangs on Boot RancherOS	0	952	April 20, 2021
Auto restart sometimes not working Rancher 1.x	3	2512	March 25, 2019
Rancher stops responding and restarts regularly Rancher	13	7133	December 10, 2020
Why was cAdvisor replaced with Docker stats? Rancher 1.x	3	1517	February 28, 2017
Stats for host/containers always show as "Connecting..." Rancher 1.x	7	2863	April 27, 2018

Du is causing rancherOS to hang

Related topics