We have a two node cluster with everything working correctly. I have tested our overlay network and containers can communicate with each other on different hosts. But health checks are failing.
I see other posts but they had issues with their overlay network.
What else can I check? I test the checks from a container on the other host with curl and the tests succeed. Is there a log with where the tests are run from and what the results are?
Did you specifically exec into the Network Agent and ping the IP of the other network agent on the other host?
I can do that but the overlay is working fine. I have dozens of containers
on the two hosts communicating across hosts.
I docker exec’ed to both network agents. They are behaving fine. I can ping all instances on each host, including the other network agent.
I tried running the health check using curl and it succeeds. But if I launch the same service it keeps restarting. I assume because the health check fails. Even though I say “Take no action” in the health check settings.
I have 2 identical services and they are being assigned to the same machine. Oddly one is working and one is not.
So 2 problems
- Health check fails even though I can run the same from curl on the network agent. I don’t know how to troubleshoot this. The url is a simple /ping
- Instances continuously restarting even though I saw “Take no action” on the health check.
It doesn’t sound like a health check failure based on what you’re describing.
For the failing service, can you look at the containers of the failing service? What shows up? Are they in
Error state? If so, can you click on one of them to look at the reason why they failed?
They start up fine. The logs are fine, and as I said they respond to health checks manually. I put a long wait time on the health check (the same as the identical service that works). So if I remove the health check they stay up and do not restart.
And they never go to the “Error” state, they stay in the yellow “Initializing”