Overlay network issue

I’m using rancher since several months and I’m very happy with it. Until today I was using rancher to deploy small containers with all services for each slack deployed on a single host.

Now I’d like to connect an LDAP stack to others containers located on different hosts. During my test I discovered that the overlay network and specially the ip addresses are unreachable between hosts !

Here is my context:

  • Host (A) with 2 containers 1/ and 2/
  • Host (B) with 1 container (LDAP) 3/

1/ 2/ 3/ have an ip address given by rancher overlay network in 10.42.x.x

From 1/ I can ping 2/ (and the reverse work as well) but from 1/ or 2/ I got a timeout when I try to ping 3/

It’s a really simple case, in my production env I’ve 5 actives hosts with 15 containers and I reproduce the trouble on all containers.

How can I troubleshoot a network issue with rancher ?

Thanks for your feedback,
Renaud

The most common problems are:

  • Each host needs to have a unique IP address (if 2 of them are the same, that’s wrong)
  • Each IP needs to be mutually reachable from every other host. The overlay network is direct peer-to-peer communication between the two involved hosts.
  • Each host needs 500/udp and 4500/udp open from every other host (as above, those are the ports used for IPSec)

http://docs.rancher.com/rancher/v1.2/en/faqs/troubleshooting/#cross-host-communication

Hello Vincent,

I change my firewall rules to permit these ports, and according to your link it seems that now I’ve another trouble.

Indeed here it’s an extract of my netfilter nat table:
> Chain CATTLE_PREROUTING (1 references)

num target prot opt source destination
1 MARK tcp – 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL tcp dpt:1639 MARK set 0x668a0
2 DNAT tcp – 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL tcp dpt:1639 to:10.99.139.37:639
3 MARK tcp – 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL tcp dpt:1389 MARK set 0x668a0
4 DNAT tcp – 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL tcp dpt:1389 to:10.99.139.37:389
5 DNAT tcp – 10.99.0.0/16 10.99.0.1 tcp dpt:53 to:169.254.169.250
6 DNAT udp – 10.99.0.0/16 10.99.0.1 udp dpt:53 to:169.254.169.250
7 MARK all – !10.99.0.0/16 169.254.169.250 MAC 02:BD:E1:97:06:AD MARK set 0x272e
8 MARK all – !10.99.0.0/16 169.254.169.250 MAC 02:BD:E1:6E:0D:43 MARK set 0x21f1d
9 MARK all – !10.99.0.0/16 169.254.169.250 MAC 02:BD:E1:E5:E2:68 MARK set 0x1c1f8
10 MARK all – !10.99.0.0/16 169.254.169.250 MAC 02:BD:E1:6A:BA:AB MARK set 0x23e79

So it’s missing:

2 DNAT udp – 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL udp dpt:4500 to:10.42.179.222:4500
3 DNAT udp – 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL udp dpt:500 to:10.42.179.222:500

I tried to restart the containers “rancher/agent:v1.0.2” and “rncher/agent-instance:v0.8.3 “/etc/init.d/agent-in” 4 days ago Up 4 minutes 0.0.0.0:500->500/udp, 0.0.0.0:4500->4500/udp” but the rules still missing…

Sniffing the network and specially the 500 port I can see request form others hosts but without reply coming from my host, which confirm that 500 udp port isn’t forward to the container.

How can I regen the missing rule without doing it by hands ?

Cheers,
Renaud