Post-Upgrade Issues v1.1.1 (UI Very Slow)

In trying to update a rancher installation I am now unable to use the UI has it keeps timing out. If logins are enabled it just redirects back to login without error. If authentication is disabled then time out errors often give errors. When I started the migration process the VM that was running rancher only has 2 VCPU’s and 4G of Ram. It was so unusable I couldn’t even get to a login without time outs. I’ve since upgraded the VM to 8 VCPU and 16G of ram and the UI actually loads now, but still not working properly.

I’m not certain currently if it is realated to all the failed containers that it’s trying to start and is able to for various reasons or if it’s something else unrelated to the containers.

Some sample rancher logs

2016-07-20 21:19:18,132 ERROR [:] [] [] [] [ecutorService-1] [i.c.p.e.e.i.ProcessEventListenerImpl] Unknown exception running process [instance.start:1483917] on [7509] io.cattle.platform.eventing.exception.EventExecutionException: 500 Server Error: Internal Server Error ("rpc error: code = 2 desc = "oci runtime error: could not synchronise with container process: not a directory"")
2016-07-20 21:19:19,881 ERROR [0e78991d-4a0f-4a7f-b7f9-6d16f0734402:1483872] [instance:7047->instanceHostMap:6574] [instance.start->(InstanceStart)->instancehostmap.activate] [] [cutorService-25] [c.p.e.p.i.DefaultProcessInstanceImpl] Unknown exception io.cattle.platform.eventing.exception.EventExecutionException: 500 Server Error: Internal Server Error ("rpc error: code = 2 desc = "oci runtime error: could not synchronise with container process: not a directory"")
2016-07-20 21:19:19,881 ERROR [0e78991d-4a0f-4a7f-b7f9-6d16f0734402:1483872] [instance:7047] [instance.start->(InstanceStart)] [] [cutorService-25] [i.c.p.process.instance.InstanceStart] Failed to Starting for instance [7047] 

I’m unable to login with logins enabled and if disabled and in trying to view running processes with login disabled then I get

Timeout
API request timeout (30 sec)

GET https://mysite.url:8080/v1/processinstances?endTime_null=true&limit=100&sort=id&order=desc

Reload to try again or log out

Current orchestration is cattle. Can I adjust the API timeout setting above? Anyone have recommendations for getting around this so I can poperly manage my enviroments again.

mysql> select * from setting;                                                
+----+------------------------------+-------------------------------------------------------------------------------------------------------------------+
| id | name                         | value                                                                                                             |
+----+------------------------------+-------------------------------------------------------------------------------------------------------------------+
|  1 | api.host                     | http://XXX.XXX.XXX.XXX:8080                                                                                       |
|  2 | api.security.enabled         | false                                                                                                             |
|  3 | api.auth.provider.configured |                                                                                                                   |
|  4 | api.auth.local.access.mode   | unrestricted                                                                                                      |
|  5 | api.auth.enabler             | rancher_id:16                                                                                                     |
|  6 | catalog.url                  | library=https://github.com/rancher/rancher-catalog.git,community=https://github.com/rancher/community-catalog.git |
|  7 | vm.enabled                   | false                                                                                                             |
+----+------------------------------+-------------------------------------------------------------------------------------------------------------------+

I’ve tried to supply JAVA_OPTS upon starting container but dosn’t seem to work. When look at processes on host and in container the Xmx is still set for 8G when I’ve bumped it to 14G.

root 17037 191 49.1 22293856 8076856 ? Ssl 21:59 122:53 java -Xms128m -Xmx8g -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/cattle/logs -Dlogback.bootstrap.level=WARN -Xmx14096m

What I specified on start was

docker run -d --volumes-from mad_brahmagupta_backupnew -p 8080:8080 --restart=always -e JAVA_OPTS="-Xmx14096m" rancher/server

If you look at the process above you’ll see that -Xmx is specified twice so is it really setting max java memory limit.

Hi,
i do’nt know what is the cause of your problem,
but i suggest to not set a Xmx settings higher than 4096m : rancher ui can potentially hangs sometimes, when the java garbage collector try to scan such a big memory quantity to free some space.

best regards,

Charles.