Monitoring - how are other Rancher users doing it?

The next major step in my evaluation/set up of Rancher for my production environment is monitoring. I am interested in finding out how other Rancher users are doing it.

Particularly:

  • Log aggregation of all container logs
  • Container performance metrics (CPU, RAM, disk IO, life cycle, etc).
  • Dynamic monitoring of the service’s performance metrics - i.e. if I run a web server service, with 3 container instances, I want to gather the performance metrics of each web server instance and also of my web site as a whole (requests per second, response time, etc). This is different than the metrics of the container.
  • Service states / health.
  • Alerts on events (container failed and stopped, container being started, etc).

Any information on what you are doing will be very appreciated!

For log aggregation I run a logspout container on each host which forwards my logs to my ELK stack (right now hosted at LogIt)

Something like

logspout:
  image: raychaser/logspout-http:latest
  restart: always
  environment:
    LOGSPOUT: ignore
  volumes:
  - '/var/run/docker.sock:/var/run/docker.sock'
  labels:
    io.rancher.scheduler.global: true
  tty: true
  stdin_open: true
  command: 'https://api.logit.io?http.path=/logspout/API_TOKEN/'

There is also a Prometheus/Graphite stack in the community catalog that does a great job of host and container performance metrics and can be used for alerting.

@joshuakarjala - thanks. One question - when using that solution, are the logs still kept on the host’s drive too?

Yes - they are still accessible by normal docker logs