Rancher/cattle 1.6.12 Vxlan: hosts unable to communicate in Vsphere 6.7 environment

Hi everyone,

I have a problem about the communication between containers of different docker hosts managed by Rancher (therefore on the internal network 10.42.0.0/16).

Network communication is via VxLan, port 4789 is correctly open and reachable and healthcheck istatus is OK.

In detail, I see the traffic going correctly up to the vxlan router related to the vxlan of the source host, and then it stops. It therefore does not communicate with the vxlan router relative to the vxlan of the destination host. Infact, if I launch a traceroute I see the following behavior:

traceroute 10.42.169.228
traceroute to 10.42.169.228 (10.42.169.228), 30 hops max, 60 byte packets
1 10.42.17.98 (10.42.17.98) 0.297 ms 0.222 ms 0.216 ms <------ SOURCE IP VXLAN-ROUTER

2 * * *

3 * * *

4 * * *

5 * * *

6 * * *

Sniffing the traffic on the source router I see packets going to the destination vxlan router but if I sniff the traffic on the destination vxlan router I don’t see any packets.

Of course, on the same host, the communication between the containers works correctly.

The virtual environment on which docker hosts are virtualized is Vmware (Vsphere 6.7).

I also have another environment managed with the same version of Rancher and with the exactly same docker hosts (Ubuntu 16.04), the only thing that changes is the virtualizer, which this time is Ovirt (4.2). In this environment the communication between the containers works correctly. The two environments are on the same infrastructure at the level of physical hosts and network.

The tests I have already done are the following:

  • Disabling all firewalls (including the one on Esxi)

  • Host docker management via Swarm instead of Rancher/Cattle: it works!

  • Use of other OS for docker hosts

Here are all details about the software versions:

Rancher Version: 1.6.12

Rancher services details:

rancher/net:v0.11.9
rancher/healthcheck:v0.3.3
rancher/dns:v0.15.3
rancher/storage-nfs:v0.8.5
rancher/net:holder
rancher/net:v0.13.1
rancher/metadata:v0.9.5
rancher/network-manager:v0.7.18
rancher/agent:v1.2.7

Ovirt version: 4.2.7.5-1.el7
Vsphere Vmware version: 6.7
Docker host OS: Ubuntu 16.04

Thank you in advance for any help or suggestion

Best Regards