We have lots of haproxy containers running in production and on 3 separate hosts, we have a bunch of zombie processes – all of them from logrotate. It took a little digging, but i found that it was the haproxy container that was producing these, all with monit as the parent PID.
— Rancher 0.56.1 here
root@a37178ad75ea:/# ps ax | grep -w Zs
10401 ? Zs 0:00 [logrotate] <defunct>
On the docker host…
ubuntu@rancher51:~$ ps ax | grep -w Zs | grep -v grep
21972 ? Zs 0:00 [logrotate] <defunct>
21980 ? Zs 0:00 [logrotate] <defunct>
21988 ? Zs 0:00 [logrotate] <defunct>
22024 ? Zs 0:00 [logrotate] <defunct>
22032 ? Zs 0:00 [logrotate] <defunct>
22045 ? Zs 0:00 [logrotate] <defunct>
22053 ? Zs 0:00 [logrotate] <defunct>
22070 ? Zs 0:00 [logrotate] <defunct>
22078 ? Zs 0:00 [logrotate] <defunct>
22086 ? Zs 0:00 [logrotate] <defunct>
22094 ? Zs 0:00 [logrotate] <defunct>
22102 ? Zs 0:00 [logrotate] <defunct>
22110 ? Zs 0:00 [logrotate] <defunct>
22118 ? Zs 0:00 [logrotate] <defunct>
22131 ? Zs 0:00 [logrotate] <defunct>
22145 ? Zs 0:00 [logrotate] <defunct>
22163 ? Zs 0:00 [logrotate] <defunct>
22175 ? Zs 0:00 [logrotate] <defunct>
And that’s just one of the hosts… they all are like this.
Also, logging has stopped for both haproxy and rancher-agent containers… I can see rotated logs such as rancher-dns.log.1.gz but the “live” log is gone, and /proc//fd shows that “(file deleted)” error on open files that its trying to write to.
root@7a96e4f3c2a1:/# ps ax|grep dns
864 ? Sl 71:26 /var/lib/cattle/bin/rancher-dns -log /var/log/rancher...
27287 pts/2 S+ 0:00 grep dns
root@7a96e4f3c2a1:/# ls -l /proc/864/fd
total 0
lrwx------ 1 root root 64 Mar 14 22:04 0 -> /dev/null
lrwx------ 1 root root 64 Mar 14 22:04 1 -> /dev/null
lrwx------ 1 root root 64 Mar 14 22:04 2 -> /dev/null
lrwx------ 1 root root 64 Mar 14 22:04 3 -> /var/log/rancher-dns.log.1 (deleted)
lrwx------ 1 root root 64 Mar 14 22:04 4 -> socket:[22766]
lrwx------ 1 root root 64 Mar 14 22:04 5 -> anon_inode:[eventpoll]
lrwx------ 1 root root 64 Mar 14 22:04 6 -> socket:[20462]
Notice the “(deleted)” above.
Basically, logrotate is messed up in multiple rancher containers (haproxy, rancher-agent) and we’ve made no changes to these. Its running stock versions.