Haproxy containers producing zombies + logging stops

sjwoodr · March 14, 2016, 10:05pm

We have lots of haproxy containers running in production and on 3 separate hosts, we have a bunch of zombie processes – all of them from logrotate. It took a little digging, but i found that it was the haproxy container that was producing these, all with monit as the parent PID.

— Rancher 0.56.1 here

root@a37178ad75ea:/# ps ax | grep -w Zs                                         
10401 ?        Zs     0:00 [logrotate] <defunct>

On the docker host…

ubuntu@rancher51:~$ ps ax | grep -w Zs | grep -v grep
21972 ?        Zs     0:00 [logrotate] <defunct>    
21980 ?        Zs     0:00 [logrotate] <defunct>
21988 ?        Zs     0:00 [logrotate] <defunct>
22024 ?        Zs     0:00 [logrotate] <defunct>
22032 ?        Zs     0:00 [logrotate] <defunct>
22045 ?        Zs     0:00 [logrotate] <defunct>
22053 ?        Zs     0:00 [logrotate] <defunct>
22070 ?        Zs     0:00 [logrotate] <defunct>
22078 ?        Zs     0:00 [logrotate] <defunct>
22086 ?        Zs     0:00 [logrotate] <defunct>
22094 ?        Zs     0:00 [logrotate] <defunct>
22102 ?        Zs     0:00 [logrotate] <defunct>
22110 ?        Zs     0:00 [logrotate] <defunct>
22118 ?        Zs     0:00 [logrotate] <defunct>
22131 ?        Zs     0:00 [logrotate] <defunct>
22145 ?        Zs     0:00 [logrotate] <defunct>
22163 ?        Zs     0:00 [logrotate] <defunct>
22175 ?        Zs     0:00 [logrotate] <defunct>

And that’s just one of the hosts… they all are like this.

Also, logging has stopped for both haproxy and rancher-agent containers… I can see rotated logs such as rancher-dns.log.1.gz but the “live” log is gone, and /proc//fd shows that “(file deleted)” error on open files that its trying to write to.

root@7a96e4f3c2a1:/# ps ax|grep dns                                             
864 ?        Sl    71:26 /var/lib/cattle/bin/rancher-dns -log /var/log/rancher...
27287 pts/2    S+     0:00 grep dns
                                         
root@7a96e4f3c2a1:/# ls -l /proc/864/fd                                         
total 0                                                                         
lrwx------ 1 root root 64 Mar 14 22:04 0 -> /dev/null                           
lrwx------ 1 root root 64 Mar 14 22:04 1 -> /dev/null                           
lrwx------ 1 root root 64 Mar 14 22:04 2 -> /dev/null                           
lrwx------ 1 root root 64 Mar 14 22:04 3 -> /var/log/rancher-dns.log.1 (deleted)
lrwx------ 1 root root 64 Mar 14 22:04 4 -> socket:[22766]                      
lrwx------ 1 root root 64 Mar 14 22:04 5 -> anon_inode:[eventpoll]              
lrwx------ 1 root root 64 Mar 14 22:04 6 -> socket:[20462]

Notice the “(deleted)” above.

Basically, logrotate is messed up in multiple rancher containers (haproxy, rancher-agent) and we’ve made no changes to these. Its running stock versions.

sjwoodr · March 14, 2016, 10:06pm

Oops… meant to post this in the beta forum, but here it is anyway. ¯\(ツ)/¯

denise · March 15, 2016, 4:16am

This seems like it’s related to

This was fixed in v0.59.0+

sjwoodr · March 15, 2016, 1:01pm

Awesome! We’re a couple versions behind that, and were putting off upgrading until your GA release comes out, but maybe we should go ahead and do it anyway. Especially if you have that fix for the load balancer 503’ing when all the services are actually up!

Topic		Replies	Views
HAProxy crash/restart issue Rancher v0.51.0 Rancher 1.x	1	1412	February 8, 2016
Rancher HA Management Stack not coming up Rancher 1.x	20	2914	April 30, 2016
Rancher HA Stack "Degraded" Rancher 1.x	27	5821	May 2, 2016
Rancher lb and logging	0	813	March 12, 2017
Logs and Shell Disconnected Rancher 1.x	2	4603	November 8, 2017

Haproxy containers producing zombies + logging stops

Related Topics