I’m running rancher/server on RancherOS v0.8.0, I also tried v0.8.0-rc11. Server is started with the following command:
sudo docker run -d --restart=unless-stopped -p 8080:8080 rancher/server:stable
I noticed that web-ui was crashing quite a lot. Investigations led me to very high CPU and memory usage by Java process: always ~40% RAM and ~15% CPU.
Eventually it eats all the memory which leads to the nonsenses like load average: 18.76, 10.55, 4.73 and also the following, though I have only 3 workers registered to it:
What are rancher/server minimum requirements?? Current resource usage seem a bit inadequate.
My current setup:
rancher@ros-m01:~$ free -m
total used free shared buffers cached
Mem: 993 899 94 0 13 191
-/+ buffers/cache: 694 298
Swap: 0 0 0
rancher@ros-m01:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 61
Model name: Intel Core Processor (Broadwell)
Stepping: 2
CPU MHz: 3399.836
BogoMIPS: 6799.67
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
L3 cache: 16384K
NUMA node0 CPU(s): 0
Please advise.
EDIT:
It seems that such behavior can only be observed during the ignition, when all/most services are initialized it tends to stabilize, though it’s still using vast amounts of RAM.
I’m having the same issue - I finally got Rancher Server going on my DO droplet, but I wanted to see if I could run my server and agents on the same box. Unfortunately after 30+ minutes not all of the services have passed health checks, the dashboard ui is really slow and checking the server load it’s similar to what you report. This doesn’t seem like a workable solution for small/hobby sites like mine. It would be nice if they would update the docs to say as much.
I have a similar problem running Rancher Server on a 2GB memory droplet. Normally load balance is hardly over 1. Now it is in average around 5. Rancher Java CPU is often more than a 100%.
Running Rancher v.1.4.1
Process 0 and non delayed. Not a single issue on the web UI.
/var/lib/cattle/logs/cattle-error.log below with only some issues. But the errors are 3 hours away from now.
Same applies for docker log. Last error 3 hours ago. see below.
cattle-error.log
2017-03-10 08:00:08,348 ERROR [4c26fb67-bfb2-4218-9ef8-4f13e422613f:133724] [instance:1685->instanceHostMap:1553] [instance.start->(InstanceStart)->instancehostmap.activate] [] [utorService-751] [c.p.e.p
.i.DefaultProcessInstanceImpl] Agent error for [compute.instance.activate.reply;agent=20]: no such file or directory
2017-03-10 08:00:08,348 ERROR [4c26fb67-bfb2-4218-9ef8-4f13e422613f:133724] [instance:1685] [instance.start->(InstanceStart)] [] [utorService-751] [i.c.p.process.instance.InstanceStart] Failed [1/3] to S
tarting for instance [1685]
2017-03-10 09:11:45,862 ERROR [:] [] [] [] [TaskScheduler-1] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [129] count [3]
2017-03-10 09:11:45,862 ERROR [:] [] [] [] [TaskScheduler-1] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [138] count [3]
2017-03-10 09:11:45,863 ERROR [:] [] [] [] [TaskScheduler-1] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [164] count [3]
2017-03-10 09:11:45,863 ERROR [:] [] [] [] [TaskScheduler-1] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [185] count [3]
docker logs rancher server
2017-03-10 09:11:45,862 ERROR [:] [] [] [] [TaskScheduler-1] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [129] count [3]
2017-03-10 09:11:45,862 ERROR [:] [] [] [] [TaskScheduler-1] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [138] count [3]
2017-03-10 09:11:45,863 ERROR [:] [] [] [] [TaskScheduler-1] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [164] count [3]
2017-03-10 09:11:45,863 ERROR [:] [] [] [] [TaskScheduler-1] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [185] count [3]
time=“2017-03-10T09:11:47Z” level=info msg="Installing builtin drivers"
time=“2017-03-10T09:11:49Z” level=info msg="Downloading all drivers"
time=“2017-03-10T09:11:51Z” level=info msg="Copying /var/lib/cattle/machine-drivers/1f7058341420e2f525168052818c3f819ff78e9ca5f57d5a650a049bcd5945e9-docker-machine-driver-packet => /usr/local/bin/docker-machine-driver-packet"
time=“2017-03-10T09:11:52Z” level=info msg="Done downloading all drivers"
2017/03/10 09:33:35 http: TLS handshake error from XX.XXX.XXX.XXX.XXX:62503: tls: oversized record received with length 20624
2017/03/10 09:33:35 http: TLS handshake error from XX.XXX.XXX.XXX.XXX:31399: tls: oversized record received with length 20624
2017/03/10 09:33:35 http: TLS handshake error from XX.XXX.XXX.XXX.XXX:61967: tls: oversized record received with length 20624
2017/03/10 09:33:46 http: TLS handshake error from XX.XXX.XXX.XXX.XXX:45813: EOF
2017/03/10 09:33:57 http: TLS handshake error from XX.XXX.XXX.XXX.XXX:33611: EOF
2017/03/10 09:34:08 http: TLS handshake error from XX.XXX.XXX.XXX.XXX:38837: EOF
2017/03/10 09:34:08 http: TLS handshake error from XX.XXX.XXX.XXX.XXX:47253: tls: oversized record received with length 20480
2017/03/10 09:34:08 http: TLS handshake error from XX.XXX.XXX.XXX.XXX:46653: tls: oversized record received with length 20480
2017/03/10 09:34:08 http: TLS handshake error from XX.XXX.XXX.XXX.XXX:60657: tls: oversized record received with length 20480
2017/03/10 09:34:19 http: TLS handshake error from XX.XXX.XXX.XXX.XXX:46815: EOF
2017/03/10 09:34:30 http: TLS handshake error from XX.XXX.XXX.XXX.XXX:52511: EOF
2017/03/10 09:34:41 http: TLS handshake error from XX.XXX.XXX.XXX.XXX:27281: EOF
Hi, we’ve got the same problem: when Rancher starts on 2G RAM droplet everything seems to be okay but after several days of usage Java process mem/cpu usage almost freezes VM.
Looks like some kind of memory leak problem.
Any ideas on how to get the root cause to help you guys fix it?
rancher v1.41
cattle v0.176.9
rancher compose v0.12.2
UPD: not sure if this is a good htop output filtered by java: