I dont understand Prometheus in Rancher at all

On Rancher 2.2.4, things I dont understand:

  • With only cluster-level monitoring enabled, I could still see pod metrics for all pods in the cluster. What “extra” functionality does project-level monitoring provide?
  • I can’t make prometheus scrape the /metrics endpoint of my app. The documentation is very vague on what is required to set this up. I’ve added the following annotations to the deployment so they’re set on all pods:
  annotations:
    prometheus.io/port: "5000"
    prometheus.io/scrape: "true"

But there’s still no custom metrics available, either in the project-level prometheus or the cluster-level one. I’m a bit stuck. Some more detailed documentation would be awesome.

okay so at least I understand why cluster-monitoring doesnt scrape my app. It’s configured to be limited to the following namespaces:

  kubernetes_sd_configs:
  - role: pod
    namespaces:
      names:
      - ingress-nginx
      - ingress-controller
      - kube-system

Is there any way (non-intrusive) to override this? I don’t want a prometheus instance per project, I want one global for the cluster.

So, researching a bit more:
I enabled the “project-level” prometheus, but as far as I can see from the config in that prometheus instance, it doesn’t scrape from all namespaces in the project, only in the auto-generated prometheus namespace:

- job_name: cattle-prometheus-p-n6xhc/prometheus-project-monitoring/0
  scrape_interval: 1m
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - cattle-prometheus-p-n6xhc

(obviously I could be wrong. But with out any proper documentation from Rancher on how this is supposed to work, I’m left guessing)

ok so I spun up a redis instance using the gui and enabled custom metrics.
When investigating the auto-created deployment I’m noticing this annotation:

field.cattle.io/workloadMetrics: '[{"path":"","port":7070,"schema":"HTTP"}]'

So it seems that Rancher is using it’s own custom annotation for stuff. Which is fine if it was documented somewhere.

I’m still wondering why I HAVE to use project-level prometheus instances. We simply have no need for that kind of separation. I could understand it in a hosting environment with multi-tenancy in a single cluster, but for us it’s completely overkill and very much a waste of resources.

This is a great point, and one I fully share. Prometheus is a pretty large deployment, and adding it at a project level could be a valid option, but it seems much too large of an overkill for my situation. I feel like the documentation on how to use the tools is not complete, and I should have the option to combine the cluster and project level behaviors if that fits my needs.

Per the project monitoring page:

Project monitoring allows you to view the state of pods running in a given project. Prometheus collects metrics from the project’s deployed HTTP and TCP/UDP workloads.

I have had cluster monitoring enabled for a while, and I just enabled project monitoring for one of my projects. From what I can tell, all you get is the new Workload Metrics dropdown under the Deployment that shows you the load statistics. Even though I only enabled the project monitor today, the graphs show me data from the last 7 days back until when I enabled the cluster monitor, so it appears that the data is already being collected for all of the deployments by the cluster monitor, but the project monitor gives you the metrics in the Rancher UI.

I clicked on the Grafana icon under my project’s Metrics, and it took me to the Deployment page in Grafana. In another window, I opened the Grafana page from the Cluster level, and navigated to my Deployment from there, and the data it shows is exactly the same.

It appears that the only thing that project monitoring does, is gives you the metrics from within Rancher UI, but the data for all of your deployments is already being collected by the cluster monitoring.

project monitoring definetely sets up a new project-scoped namespaces and (I’m pretty sure) a new prometheus instance covering the project namespaces. Maybe the cluster-scoped prometheus scrapes cluster stuff such as metrics-server etc, while project-scoped prometheus scrapes workload scrape endpoints?

Again, I don’t understand why it’s built so complex.

1 Like

I had a exactly same thought like you, why different Prometheus Server per Project?
It might be two different reason ,I guess
(1) lack of access control at Prometheus server (but is it possible to filter namespace at Prometheus belong to project? )
(2) to isolate metrics physically between project (but multi tenancy doesn’t need to be isolated physically always)

Our company had developed open source Kubernetes monitoring solution (https://github.com/NexClipper/NexClipper) and I’m considering to contribute to Rancher to support multi-tenancy monitoring with ONE our monitoring server after finding that issue.
Our NexClipper support multi Cluster soon(testing now) and we’ll start to develop for multi tenancy soon. If we support multi tenancy just like with Rancher’s Project, we could support global monitoring server for multi cluster and project at Rancher.