The next major step in my evaluation/set up of Rancher for my production environment is monitoring. I am interested in finding out how other Rancher users are doing it.
Particularly:
- Log aggregation of all container logs
- Container performance metrics (CPU, RAM, disk IO, life cycle, etc).
- Dynamic monitoring of the service’s performance metrics - i.e. if I run a web server service, with 3 container instances, I want to gather the performance metrics of each web server instance and also of my web site as a whole (requests per second, response time, etc). This is different than the metrics of the container.
- Service states / health.
- Alerts on events (container failed and stopped, container being started, etc).
Any information on what you are doing will be very appreciated!