Container stuck Initializing despite passing health check

I’m using v 1.0.1 and have a passing healthcheck (verified by looking at the container logs, see below screenshot) on an nginx container, but it continues to remain in an “Initializing” state. The container is also serving up traffic. How can I determine why it’s stuck in this state? Here’s a screenshot:

I don’t know what the problem was, but rebooting the host appears to have resolved the issue.

(Assuming you have at least 3 hosts) health checks are run on 3 separate hosts. The log shows only 2 IPs, so the third check was failing, presumably because there was an issue with cross-host communication for that host that rebooting fixed.

Oh… I didn’t realize that. I only have 2 nodes in a given environment. Is 3 the minimum?

No it’s supposed to work with any number of hosts, but 4+ is optimal (1 the container is running on, plus 3 separate ones to check it). With <= 3 one of the checks will be running on the same host as the container, which is sub-ideal, and with 1 or 2 there’s only that many checks instead of 3.

I see… so it sounds like if I stick with 2 nodes then it’ll always say “Initializing” and never “Active”

No, it’s possible there’s a bug you’re hitting with the n=2 case but it should still go active when all the checks are passing, regardless how many there are.

Gotcha. Ironically, a completely different Rancher instance at work that contains 5 total nodes, I’m seeing almost all my containers with health checks stuck at “Initializing” as well. Are there logs somewhere to indicate the results of the health check? I don’t want to leave them Initializing (even if it’s correctly serving traffic) if it’s going to keep writing to a log file somewhere, eventually running out of disk space.

1 Like

I am having the same problem where containers are getting stuck with Initializing if they have a health check. I have four “compute hosts” and rancher server in HA mode (three hosts). All the setup appears green, nothing reconnecting.

Rancher v1.1.0-dev3
Cattle v0.161.5
User Interface v1.1.3
Rancher Compose v0.8.2
Docker: 1.11.1
Debian Jessie

Best regards,

Alejandro

@alexR Can you try making sure that network connectivity is still working? Log into 1 of the network agents on 1 host and ping the IP of the other network agents (10.42.x.x) on all the other hosts?

Hi Denise,

Thanks for the followup, As I recall the problem turned out to be a networking (firewall) problem I had a few rules that seemed harmless (opening ports, not even closing them) that got setup via Puppet. I flushed the firewall rules, restarted the docker-engine so the necessary rules were recreated and re-installed the agents (just to be thorough).

Puppet: not even once! :grin:

Again thanks!

Alejandro