Prometheus metrics vs Docker stats

jbartlett · January 28, 2020, 10:59pm

Today I had my Prometheus container (for the cluster level metrics) OOMKilled on my small test cluster. I bumped up the memory limit from 2000MiB to 4000MiB, and it ran again. But then I saw that the Workload Metrics for Prometheus itself showed 5.18GiB of memory used. Wondering why it hadn’t been OOMKilled again, I looked at the Docker stats for the container on the node in question. It reports that 2.65GiB is being used, of the 3.906GiB limit (4000MiB = 3.906GiB). Why is there such a large discrepancy between the two sources of metrics?

jbartlett · January 29, 2020, 8:30pm

I assume it has something to do with what cAdvisor reports, vs Docker, but if that is the case, what are people using to ensure their pods are not getting dangerously close to OOM if the Prometheus metrics aren’t showing true usage?

Topic		Replies	Views
Prometheus Stack without CAdvisor Rancher 1.x	3	1698	October 11, 2016
Rancher Container Names not visible in Prometheus	0	1013	October 27, 2016
Why was cAdvisor replaced with Docker stats? Rancher 1.x	3	1517	February 28, 2017
Rancher charts total memory/cpu reports Rancher	0	1248	June 26, 2020
Lightweight metrics options for k3s/Pi cluster k3s, k3OS, and k3d	2	970	December 26, 2020

Prometheus metrics vs Docker stats

Related topics