High CPU load on network-services/metadata after upgrade to 1.4.1

floncar · February 16, 2017, 2:35pm

Hi, I’m new at both rancher and docker, but I noticed a high load in zabbix on my 3 nodes.

After debugging for two days I came across the network-services/metadata stack (new one in Infrastructure) and saw that the CPU load on all metadata services is around 200% or more.

This is the log:

16/02/2017 15:31:36time=“2017-02-16T14:31:36Z” level=info msg="Downloaded in 8.4477361s"16/02/2017 15:31:38time=“2017-02-16T14:31:38Z” level=info msg="Loading answers"16/02/2017 15:31:40time=“2017-02-16T14:31:40Z” level=info msg="Loaded answers"
And that happens every second.

The load is around 15-25 per server. More if something is upgraded or started in rancher.

Anyone know where I should look next? Googling didn’t yield any answers.

Thanks upfront.

floncar · February 16, 2017, 2:35pm

Furthermore if I stop the rancher-server the load drops down to a 1 maybe 2.

floncar · February 17, 2017, 8:39am

I noticed that my database table Instance has over 600MB and more than 95% of it is in purged state.

Could that be the problem that the metadata is trying to get info on instances that don’t exist any more?

vkruoso · September 1, 2017, 4:30pm

Hi!

This seems really old, but I’m facing a similar issue using Rancher v1.5.10. When I restart or upgrade e Route53 DNS service, the metadata service CPU usage goes up (the network-services-metadata-dns container).

This is the log I see during the restart process:

time="2017-09-01T16:23:54Z" level=info msg="Reloading answers"
time="2017-09-01T16:23:54Z" level=info msg="Reloaded answers"
time="2017-09-01T16:23:58Z" level=info msg="Reloading answers"
time="2017-09-01T16:23:58Z" level=info msg="Reloaded answers"
time="2017-09-01T16:24:09Z" level=info msg="Reloading answers"
time="2017-09-01T16:24:10Z" level=info msg="Reloaded answers"
time="2017-09-01T16:24:11Z" level=info msg="Reloading answers"
time="2017-09-01T16:24:11Z" level=info msg="Reloaded answers"
time="2017-09-01T16:24:12Z" level=info msg="Reloading answers"
time="2017-09-01T16:24:12Z" level=info msg="Reloaded answers"
time="2017-09-01T16:24:13Z" level=info msg="Reloading answers"
time="2017-09-01T16:24:13Z" level=info msg="Reloaded answers"
time="2017-09-01T16:24:13Z" level=info msg="Reloading answers"
time="2017-09-01T16:24:13Z" level=info msg="Reloaded answers"
time="2017-09-01T16:24:14Z" level=info msg="Reloading answers"
time="2017-09-01T16:24:14Z" level=info msg="Reloaded answers"
time="2017-09-01T16:24:16Z" level=info msg="Reloading answers"
time="2017-09-01T16:24:16Z" level=info msg="Reloaded answers"

Is there anything we can do to improve that behavior?

Thanks,
Vinicius

vincent · September 6, 2017, 9:40am

Short term no, when things change metadata is updated and the complete yaml file is sent to the host and parsed. Longer term (2.0) yes, diffs will be/are sent incrementally instead of the complete file.

vkruoso · September 6, 2017, 11:14am

Thanks @vincent for your answer.

It really seems some unnecessary work is being made. I imagine that the complete yaml file is not that large to incur a network usage of 150-200Mbits/s for a few seconds. Also looking forward for the news at 2.0.

We have 70 services right now, and we are growing every day. For the moment we need to change the architecture so this wont be happening.

Topic		Replies	Views
Stupid question but: Is Rancher-Metadata required?	4	976	February 17, 2017
"rancher-metadata" is opening too many files (actually sockets) Rancher 1.x	25	4187	March 7, 2017
High load and i/o with Rancher 1.x Rancher 1.x	0	907	February 14, 2018
High CPU from `/usr/bin/dockerd --raw-logs`	0	1663	April 25, 2017
Rancher eating all the CPU, is it overloaded? Rancher 1.x	1	947	June 18, 2018

High CPU load on network-services/metadata after upgrade to 1.4.1

Related topics