Rancher health checks

We have a couple of containers that die sporadically (about 2-3 times per day). We have health checks associated with those and my gut is telling me that rancher is killing the container. Is there any log or UI that we can look at to see the history of the health checks against these services?

1 Like

FYI… removing the TCP health checks has resulted in our containers staying up. Appears there’s a problem with TCP health checks?

Hard to tell why without logging, but the container is probably actually not responding in the allowed timeout sometimes as far as haproxy is concerned, or else it wouldn’t get marked unhealthy. We might be the cause of that though, there is a lag in setting up new ipsec tunnels and I think existing ones may expire and get re-established after some period.

@alena logging failed checks? #2268

After troubleshooting with @tobowers on a different networking issue, we think this is related to the fact that certain AWS instances types do not work with IPsec.

Created this issue to track the AWS instance issue:
https://github.com/rancher/rancher/issues/2332

Hi, I had this same experience yesterday while trying to bring up a stack with just 1 container in it, with health check on (1 minute intervals) it would be restarted usually after around 40-45 minutes of uptime, as soon as I took the health check off it’s now been up for 15 hours. I’m using a separate host for Rancher management and the stack was deployed on a separate clean Ubuntu host (not sure this is relevant but thought I’d include it for reference).

Thanks,
John