A few days ago some webites on my cluster where not available. I found out that pinging any of the 10.41.* IPs of the worker node did not work. Since I did not know what to do I restarted all servers. That solved the problem.
I am using flannel as network layer.
How can I setup an alert, that would warn me about such basic things? I have Prometheus running on a clusterlevel and get etcd warnings once in a while that resolve themselves.
I could see an increase of ingress response times in the Kubernets Components Metrics at that very time. How can a get alerts about that?
Can I do this on a clusterlevel? I do not want to manage monitoring for every single project.
I found the alert/monitoring doc but that´s not very helpfull.