Rancher creating countless Java processes

I’m running Rancher (v.1.1.2) on my server (Ubuntu 14.04 LTS) and I’m noticing that there are countless java processes showing up, using an ever-increasing amount of memory. I asked a friend of mine who also uses Rancher in a similar setup if he experienced this, but he said he doesn’t see this on his machine. As of writing this, there are 136 instances of java -Xms128m -Xmx2g -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/cattle/logs -Dlogback.bootstrap.level=WARN -Xmx4096m -cp /usr/share/cattle/be21b2bf0c1a2d74b75c887ce9982c6e:/usr/share/cattle/be21b2bf0c1a2d74b75c887ce9982c6e/etc/cattle io.cattle.platform.launcher.Main using 6882M of virtual memory.

I tried searching through the logs in the rancher_server and rancher-agent containers, and found the following:

rancher_server: /var/lib/cattle/logs/cattle-error.log

[code]2016-07-27 00:04:41,794 ERROR [:] [] [] [] [cutorService-24] [i.c.p.e.e.i.ProcessEventListenerImpl] Unknown exception running process [instance.purge:127] on [2] org.jooq.exception.DataChangedException: Database record has been changed
at org.jooq.impl.UpdatableRecordImpl.checkIfChanged(UpdatableRecordImpl.java:550) ~[jooq-3.3.0.jar:na]
at org.jooq.impl.UpdatableRecordImpl.storeUpdate0(UpdatableRecordImpl.java:291) ~[jooq-3.3.0.jar:na]
at org.jooq.impl.UpdatableRecordImpl.access$200(UpdatableRecordImpl.java:90) ~[jooq-3.3.0.jar:na]
at org.jooq.impl.UpdatableRecordImpl$3.operate(UpdatableRecordImpl.java:260) ~[jooq-3.3.0.jar:na]
at org.jooq.impl.RecordDelegate.operate(RecordDelegate.java:123) ~[jooq-3.3.0.jar:na]
at org.jooq.impl.UpdatableRecordImpl.storeUpdate(UpdatableRecordImpl.java:255) ~[jooq-3.3.0.jar:na]
at org.jooq.impl.UpdatableRecordImpl.update(UpdatableRecordImpl.java:149) ~[jooq-3.3.0.jar:na]
at io.cattle.platform.object.impl.JooqObjectManager.persistRecord(JooqObjectManager.java:223) ~[cattle-framework-object-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.object.impl.JooqObjectManager.setFieldsInternal(JooqObjectManager.java:130) ~[cattle-framework-object-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.object.impl.JooqObjectManager$3.execute(JooqObjectManager.java:118) ~[cattle-framework-object-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.idempotent.Idempotent.change(Idempotent.java:88) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.object.impl.JooqObjectManager.setFields(JooqObjectManager.java:115) ~[cattle-framework-object-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.object.impl.JooqObjectManager.setFields(JooqObjectManager.java:110) ~[cattle-framework-object-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.process.common.generic.GenericResourceProcessState.applyData(GenericResourceProcessState.java:96) ~[cattle-iaas-logic-common-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runHandler(DefaultProcessInstanceImpl.java:440) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$4.execute(DefaultProcessInstanceImpl.java:375) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$4.execute(DefaultProcessInstanceImpl.java:369) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.idempotent.Idempotent.execute(Idempotent.java:42) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runHandlers(DefaultProcessInstanceImpl.java:369) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runLogic(DefaultProcessInstanceImpl.java:471) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runWithProcessLock(DefaultProcessInstanceImpl.java:305) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$2.doWithLockNoResult(DefaultProcessInstanceImpl.java:245) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.LockCallbackNoReturn.doWithLock(LockCallbackNoReturn.java:7) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.LockCallbackNoReturn.doWithLock(LockCallbackNoReturn.java:3) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl$3.doWithLock(AbstractLockManagerImpl.java:40) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.LockManagerImpl.doLock(LockManagerImpl.java:33) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:13) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:37) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.acquireLockAndRun(DefaultProcessInstanceImpl.java:242) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runDelegateLoop(DefaultProcessInstanceImpl.java:184) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.executeWithProcessInstanceLock(DefaultProcessInstanceImpl.java:157) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$1.doWithLock(DefaultProcessInstanceImpl.java:107) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$1.doWithLock(DefaultProcessInstanceImpl.java:104) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl$3.doWithLock(AbstractLockManagerImpl.java:40) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.LockManagerImpl.doLock(LockManagerImpl.java:33) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:13) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:37) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.execute(DefaultProcessInstanceImpl.java:104) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.eventing.impl.ProcessEventListenerImpl.processExecute(ProcessEventListenerImpl.java:74) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.eventing.impl.ProcessEventListenerImpl.processExecute(ProcessEventListenerImpl.java:56) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at sun.reflect.GeneratedMethodAccessor475.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_101]
at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_101]
at io.cattle.platform.eventing.annotation.MethodInvokingListener$1.doWithLockNoResult(MethodInvokingListener.java:76) [cattle-framework-eventing-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.LockCallbackNoReturn.doWithLock(LockCallbackNoReturn.java:7) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.LockCallbackNoReturn.doWithLock(LockCallbackNoReturn.java:3) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl$3.doWithLock(AbstractLockManagerImpl.java:40) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.LockManagerImpl.doLock(LockManagerImpl.java:33) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:13) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:37) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.eventing.annotation.MethodInvokingListener.onEvent(MethodInvokingListener.java:72) [cattle-framework-eventing-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.eventing.impl.AbstractThreadPoolingEventService$2.doRun(AbstractThreadPoolingEventService.java:135) [cattle-framework-eventing-0.5.0-SNAPSHOT.jar:na]
at org.apache.cloudstack.managed.context.NoExceptionRunnable.runInContext(NoExceptionRunnable.java:15) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]
at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:108) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]
at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_101]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_101]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]

2016-07-27 05:35:38,860 ERROR [:] [] [] [] [ecutorService-7] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [1] count [3][/code]

And on rancher-agent: /var/lib/rancher/agent.log

2016-07-27 21:28:34,225 ERROR agent [140398882495536] [event.py:112] Error in request : 90039c50-5a1b-4b6e-8409-78db2aa6d0ac Traceback (most recent call last): File "/var/lib/cattle/pyagent/cattle/agent/event.py", line 95, in _worker_main resp = agent.execute(req) File "/var/lib/cattle/pyagent/cattle/agent/__init__.py", line 15, in execute return self._router.route(req) File "/var/lib/cattle/pyagent/cattle/plugins/core/event_router.py", line 13, in route resp = handler.execute(req) File "/var/lib/cattle/pyagent/cattle/plugins/core/event_handlers.py", line 32, in execute type.on_ping(event, resp) File "/var/lib/cattle/pyagent/cattle/plugins/docker/compute.py", line 126, in on_ping self._add_instances(ping, pong) File "/var/lib/cattle/pyagent/cattle/plugins/docker/compute.py", line 138, in _add_instances running, nonrunning = self._get_all_containers_by_state() File "/var/lib/cattle/pyagent/cattle/plugins/docker/compute.py", line 171, in _get_all_containers_by_state for c in client.containers(all=True): File "/var/lib/cattle/pyagent/dist/docker/api/container.py", line 69, in containers res = self._result(self._get(u, params=params), True) File "/var/lib/cattle/pyagent/dist/docker/utils/decorators.py", line 47, in inner return f(self, *args, **kwargs) File "/var/lib/cattle/pyagent/dist/docker/client.py", line 112, in _get return self.get(url, **self._set_request_timeout(kwargs)) File "/var/lib/cattle/pyagent/dist/requests/sessions.py", line 487, in get return self.request('GET', url, **kwargs) File "/var/lib/cattle/pyagent/dist/requests/sessions.py", line 475, in request resp = self.send(prep, **send_kwargs) File "/var/lib/cattle/pyagent/dist/requests/sessions.py", line 585, in send r = adapter.send(request, **kwargs) File "/var/lib/cattle/pyagent/dist/requests/adapters.py", line 479, in send raise ReadTimeout(e, request=request) ReadTimeout: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=2) 2016-07-27 21:29:32,246 INFO requests.packages.urllib3.connectionpool [140398882495536] [connectionpool.py:248] Resetting dropped connection: <hostname>

I’ve tried searching around for an explanation and/or solution to this, checking the github bug reports, and searching the forums, but to no avail. Can I get some help on figuring out what’s going on? I truly appreciate it.

Hi there,

I also have exactly the same problem.

Additional info of my (test) setup:

1 physical host with 32GB ram, 8 Cores, 1 SSD
Running 3 VMs.

Rancher Server ist installed directly on the host.

3 Agents on each VM.

1 VM has 1x Stack with 2 Containers
1 VM has 2x Stacks, 2 / 1 Containers
1 VM has 1x Stacks with 2 Stopped Containers

Ok so something is strange here, on a complete fresh install on ubuntu 16 on a VM with Virtualbox, rancher spawns countless threads, using 1,5Gb of Ram. There are no hosts, containers or anything else running on the VM, also it is a complete fresh install of rancher, I even didn’t login.

I too am getting this issue, out of nowhere java processes just kept on spinning up.

I am also seeing this. Does anyone have any insight?

I dont have a solid fix, but I can say that switching to an external mysql db solved this problem for me.

Thanks, I’ll try that. I have a vm set aside for the DB but haven’t gotten around to set it up yet

yeah let me know, I had it happen a few times and I had forgotten to dl the rancher ssh keys so I had no way to reconnect my hosts

I am setting up rancher in a Proxmox server using Ansible along with a few LXC containers for vpn, vault, consul etc that i want outside of rancher. So I can destroy the whole thing and just start my playbook and in a few mins its all back. I do it quite often actually to test the playbook.