I’m currently witnessing a problem where containers that get scheduled onto a single host seem to start and run just fine but cannot reach other containers on any other hosts in the environment (via the 10.42.x.x network).
If I open a shell to one of the containers, I am able to resolve the names of the other containers just fine and their IP addresses are correct. However I cannot open connections to ports that the other containers are exposing.
For the time being, I’ve ‘evacuated’ the host and disabled it so that my containers get scheduled onto others.
One other thing I noticed is that the healthcheck service shows stuck as initializing (on other hosts this comes up OK), but all other infrastructure services appeared to be working just fine.
Here’s a screenshot:
Any idea what could be causing this?