Rancher server memory, crash after a day or two

ybizeul · October 15, 2017, 5:34pm

Rancher server unusable, it constantly crashes after a few days.

It could be the kernel randomly killing process to free memory, or java committing suicide on rancher server itself.

Can’t even stop container :

rancher@container-01:~$ docker stop 10a91b7e2104
Error response from daemon: Cannot stop container 10a91b7e2104: Cannot kill container 10a91b7e21047913dbdafa6145a07c5f9def021f33afd56b1b2f5298be015405: rpc error: code = 2 desc = containerd: process not found for container

Posted in rancher forums but might be a rancheros issue.

Rancher server 1.6.10
Rancher OS 1.1.0
docker-17.03.2-ce

Rancher server trace :
https://pastebin.com/nTSpt5dS

ybizeul · October 15, 2017, 7:53pm

Investigating a balloning issue, similar to this :

Will update accordingly

flantel · October 16, 2017, 1:54pm

Hi

You don’t say what RAM the server instance has. I used to get this also when running without sufficient RAM. Currently I use T2.medium (4GB RAM) for my rancher-server and it is solid.

ybizeul · October 16, 2017, 4:53pm

The server has 32G, with that said, it has many other VMs and is actually under resource constraints (that’s a home lab environment), so it might the the explanation right there

ybizeul · October 17, 2017, 5:32am

No problem since I disabled the ballooning driver in ros vmware tools, time for a lab upgrade I guess!

With that said, it’s interesting how unrelated conditions at first sight happen to be the cause of the problem. In my candid mind, java and ballooning would just play along like good kids but they don’t. Even though my lab is overallocating RAM, to some extend, it’s often the case with virtualization, and ballooning, though indicative of a memory issue, should not cause that kind of issues.

I wonder if the official vmware tools would have the same effect.

Reference :https://support.azul.com/hc/en-us/articles/115001559526-VMware-Balloon-Driver

Quick steps :

Find the module name: /sbin/lsmod | grep balloon
Remove the module: sudo modprobe -r vmw_balloon

Topic		Replies	Views
RancherOS ntp stuck in restarting RancherOS	4	1583	October 23, 2015
[Rancher v1.6.18]Facing Memory Issues on Rancher Host Rancher 1.x	4	2783	October 12, 2020
Rancher host has to be rebooted every couple of days Rancher 1.x	0	1048	October 5, 2016
V1.6.28 Rancher server crashes after 2 weeks (usually on sunday) Rancher 1.x	0	945	November 5, 2019
Memory/CPU leak?	6	2788	February 20, 2018

Rancher server memory, crash after a day or two

Related topics