How to use the Processes information for admin, documentation needed

demarant · October 2, 2016, 11:53am

As a Rancher admin we have access to the Processes information.

I can see a lot of processes (around 20-30) that are either timeout or other exceptions.

Unfotunatly there is no documentation at all on docs.rancher to understand how to interpret the issues here.

Will Rancher at some point give use some documentation about it? Or maybe a meetup / live demo on how to debug and use this info? It is very cryptic now.

Thanks

demarant · October 11, 2016, 4:33pm

@denise

below an example of our Processes…

we have lot of TIMEOUT and RESOURCE_BUSY
any documentation to understand these would be helpful…we have huge issues with rancher stability recently and are not able to upgrade stacks via rancher, they are stuck in a state or the other…we are almost giving up on rancher…unless we find out what is causing these timeouts and resource_busy issues.

demarant · October 12, 2016, 4:10pm

We found out that all the TIMEOUT and RESOURCE_BUSY are mainly caused by overloaded Rancher server. The VM where rancher is running has 8 vcpu, but the average load was at 20 up to 40…so we will try with a multinode setup to offload rancher. We have several hundreds containers and around 80 hosts managed by rancher.

aemneina · October 12, 2016, 5:51pm

@demarant do you have rancher agents in reconnecting state? Thats usually a cause of thread exhaustion on the rancher servers resulting in sluggishness.

demarant · October 18, 2016, 7:58am

thanks.
yes we did have lot of agen.reconnect TIMEOUTS. We moved the mysql to another server and we gave rancher java more memory (8GB) and we still see lot of processes stuck…see the picture below. I think it is related to the issue Agent.reconnect process stuck forever for the agents not linked to any host · Issue #5349 · rancher/rancher · GitHub

restarting server helps only temporarily, the stuck processes comes back or new ones
I am trying to figure out how to get rid of all those stuck processes…any tips?

ddelnano · December 13, 2016, 9:26pm

@demarant did you get to see if introducing a HA setup helped the problem? I am going through the same issues as you. Also have you tried upgrading to 1.2 to see if that helped your problems?

demarant · December 27, 2016, 6:37am

Hi, at the end what helped most was to give rancher mysql much more ram. In our setup (80 hosts, several hundreds containers) we had to give 32GB RAM and 16 cores. Moving to rancher 1.2 also helped a bit…but new issues came up…we even had to move some stacks back to rancher 1.1.4. The entire Rancher technology and docker as such is moving so fast that it is very complex and frustrating to keep up with it…in any case overall it is OK as we win speed on consistent deployments.

ddelnano · December 27, 2016, 4:45pm

@demarant was your Rancher server actually resource constrained? I was running an 8GB server with 2 cores. It was not using up all of its resources but I increased it to 16GB and 4 cores and did not notice any significant difference other than the UI being slightly more snappy. So I am guessing my issues are not necessarily related to yours. Yes the instability is very frustrating. I was never able to get anything running on Rancher 1.2 even starting with a brand new setup which is making us look at other orchestration solutions for our future.

Topic		Replies	Views
Admin processes stuck in 'Ready' state	0	753	April 8, 2017
Clean Script, Stuck Processes and Resources Rancher 1.x	0	779	November 3, 2016
Stupid question but: Is Rancher-Metadata required?	4	976	February 17, 2017
Rapid Database Growth - Rancher HA Rancher 1.x	4	1479	August 28, 2015
Rancher creating countless Java processes Rancher 1.x	8	3328	March 10, 2017

How to use the Processes information for admin, documentation needed

Related topics