Hi folks,
We’d like to use the Rancher Alert system to monitor for specific Rancher Pods that we’ve had problems with, such as cattle-cluster-agent
& cattle-node-agent
.
We would like to monitor for frequent restarts, or perhaps for the frequent occurrences of something like CrashLoopBackOff
. I cannot figure out how to monitor individual pods using Rancher Alerts.
I have tried a few things such as kube_pod_container_status_restarts_total{namespace="cattle-system", pod=~"cattle-cluster-agent*}
, as shown below but without success:
Can any one point in the correct direction?
-= Stefan