How are you monitoring services in Rancher?

steve1 · October 23, 2017, 1:52am

Just curious what others are doing…
We still aren’t able to find a right tool for the job.

hugodopradofernandes · October 23, 2017, 3:28am

*Prometheus getting data from:

cAdvisor to collect container metrics
node exporter to collect hosts metrics
prometheus-rancher-exporter to export specific rancher metrics

*Elasticsearch getting data from:

logstash+logspout getting logs from docker (and all it’s containers)
metricbeat from non-rancher hosts (nfs storages)
graylog from non-rancher nodes logs (syslogs from nfs storages)

Grafana WorldPing for url monitoring (response time, ping, etc)

All of that with dashboards on Grafana with some alerts enabled to send webhooks and take some automated actions

tgjamin · October 23, 2017, 5:17pm

We use Sematext cloud metrics+logs for our Rancher clusters. They provide a sematext-agent that runs on every host. The agent collects all logs and metrics from running containers. Logs can be pre-processed by the agent to provide structured logging. The Sematext cloud UI for logging has Elasticsearch built in, which has been very handy for us.

ajromito · October 26, 2017, 8:34pm

We’ve been using newrelic infrastructure to monitor hosts stats, so far it’s working well and super easy to set up:

version: '2'
services:
  NewRelicInfrastructure:
    image: newrelic/infrastructure
    environment:
      NRIA_CUSTOM_ATTRIBUTES: '{"environment":"dev","type": "rancher"}'
      NRIA_LICENSE_KEY: keyHere
    stdin_open: true
    network_mode: host
    volumes:
    - /:/host:ro
    - /var/run/docker.sock:/var/run/docker.sock
    tty: true
    labels:
      io.rancher.container.pull_image: always
      io.rancher.scheduler.global: 'true'

steve1 · October 27, 2017, 1:39am

WOW, That’s a lot to process. Can you tell me what part of this systems are used when, say, alerting of a failed service(running on container) happens? Like web server process crash.

joe90 · October 31, 2017, 10:04am

Datadog, works fine for us, and pulls all the data we need.

Michael_Marchese · November 1, 2017, 8:54pm

im attaching a sidekick filebeat container to grab app logs, I find it gives me a bit more control than other solutions

marcbachmann · November 12, 2017, 11:02am

I also suggest prometheus & grafana for monitoring and alerting and syslog-ng (on the host) to forward logs.
I’m using one prometheus per rancher environment.
And then a prometheus and grafana outside of rancher which should be highly avalability.

To collect the metrics of ranger services I have a confd setup to generate the prometheus config: https://github.com/marcbachmann/rancher-prometheus-config

To monitor hosts you’ll want to have a node-exporter and cadvisor in your default setup. In my setup I installed them on the host directly but you could use a rancher service.

Topic		Replies	Views
Monitoring - how are other Rancher users doing it? Rancher 1.x	4	3065	March 21, 2016
Monitoring Rancher with Nagios Rancher 1.x	4	4231	December 14, 2015
Guidance using Prometheus to monitor Rancher 1.6.* and Environments Rancher 1.x	0	910	January 18, 2018
Prometheus Stack without CAdvisor Rancher 1.x	3	1702	October 11, 2016
Rancher 25 Prometheus for cluster/project owners Rancher	9	3206	August 2, 2021

How are you monitoring services in Rancher?

Related topics