Running Rancher w/Cattle v1.1.2 on AWS with Docker 1.10.3 hosts.
On occasion, I’ll run an upgrade, and one or more containers will not be accessible by their ip address from any given number of containers. It doesn’t seem to matter which node - other containers on the same host are reachable. This results in downtime and/or slow response times from it’s LB (Haproxy times out a request after 5sec, so we notice a lot of ~5.1s requests).
Example:
Service (A) upgrades and when complete, 2 of the 3 containers can no longer connect to service (B)'s load balancer: http://service-b-lb
. Destroy the containers (scheduler re-creates them to keep expected scale) and the connection is restored.
Is this a known issue that an upgrade to Docker daemon or rancher 1.1.3/1.2 would fix?
UPDATE:
Still not sure what the cause was, but it seems to be resolved by:
- upgrading rancher to 1.1.3
- Cycling-in new hosts with Docker 1.11 engine and cycling-out the old hosts