I haven’t been able to establish a rhyme or reason for this one yet, but Rancher does not always seem to be able to correctly identify a container as “running”. This might be a misunderstanding between me and Rancher, but I would define a container to be running when Docker says it is “up”. It seems to take Rancher a minute or two to agree with that and take the container out of “starting”, but it usually does.
The problem comes when it doesn’t. The container will be happily executing, all networking appears to be done (i.e., I can access the container from another container on a different host), but Rancher will report that it is “Waiting: container starting”. This is mostly just annoying, but sometimes it will suddenly decide that it waited too long for the “starting” to complete and kill the container. This appears to happen independently of health checks, and there is no mention of why the action was taken in any logs that I can find.
I’d like to understand more about what Rancher is actually doing here. What does Rancher define “running” as, and why is it different from Docker? What reasons might Rancher have to kill a container independently of a health check?