Occasionally, a container is unreachable by IP?

willrstern · September 20, 2016, 6:21pm

Running Rancher w/Cattle v1.1.2 on AWS with Docker 1.10.3 hosts.

On occasion, I’ll run an upgrade, and one or more containers will not be accessible by their ip address from any given number of containers. It doesn’t seem to matter which node - other containers on the same host are reachable. This results in downtime and/or slow response times from it’s LB (Haproxy times out a request after 5sec, so we notice a lot of ~5.1s requests).

Example:
Service (A) upgrades and when complete, 2 of the 3 containers can no longer connect to service (B)'s load balancer: http://service-b-lb. Destroy the containers (scheduler re-creates them to keep expected scale) and the connection is restored.

Is this a known issue that an upgrade to Docker daemon or rancher 1.1.3/1.2 would fix?

UPDATE:
Still not sure what the cause was, but it seems to be resolved by:

upgrading rancher to 1.1.3
Cycling-in new hosts with Docker 1.11 engine and cycling-out the old hosts

bacheson · September 24, 2016, 4:02am

I’m still seeing this in 1.1.4. I have to restart the networking agent to remedy the problem.

martinbaillie · September 26, 2016, 7:35am

Yep, seeing the same occasionally on our stuff. The IPSec tunnel is broken for a host; restarting the network agent fixes. It would be good if there was something self-healing coming in Rancher 1.2?

Topic		Replies	Views
I'm getting "Timeout getting IP address " error rancher v1.4.1	7	7337	November 21, 2017
Internal Network seems partially split Rancher 1.x	2	719	April 9, 2017
IPSec network fails silently on a host Rancher 1.x	24	7856	November 6, 2017
Overlay network issue Rancher 1.x	2	927	September 7, 2016
Rancher IPSEC errors Rancher 1.x	1	1079	August 10, 2017

Occasionally, a container is unreachable by IP?

Related topics