Rancher Not Using Internal Address for Container Traffic

I’m trying to figure out how to make Rancher use a specific subnet for all traffic that doesn’t need to accessible outside of K8s cluster. The servers my cluster is running on has 3 networks a 10.0.2.0/24 for management of the physical hosts, 10.0.3.0/24 for external docker traffic and a 10.0.4.0/24 for internal docker traffic. The internal network is 10gb and external network is 1gb so I want all the internal traffic using the faster network. This is especially important for stuff like Longhorn that needs to move a lot of data.

My nodes are just docker on bare metal started with a command like this and when I do a tcpdump against the 1gb interface I see all of traffic there.

Note that rancher-private resolves to a 10.0.4.0/24 address and I do see traffic on port 6443 between the rancher agents on the 10gb network but nothing else

sudo docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run rancher/rancher-agent:v2.3.2 --server https://rancher-private.dev.example.com --token zf4bkvvjkn4q5547gkgc6x8bd5nnl47zthl6t5lmthv7gs4h5q6qzz --ca-checksum 75e28964c7f30bfbb2e3e30e458b557c3d6197664159767356b486a428893c00 --address 10.0.3.11 --internal-address 10.0.4.11 --worker

1 Like

Pretty much same issue here.

When using the internal address and address, it should route properly. This sounds like a bug. If you can confirm the behavior, please put an issue in. https://github.com/rancher/rancher/issues/new

@JasonvanBrackel

Nr. 1 is the IP address that is desired and that is set in the command starting the worker node, as shown below.
Nr. 2 is the other IP address I do not want to be part of Rancher at all. The necessary setup for the network traffic is done outside of Rancher. All Rancher needs to do is using the first desired IP address, which is 10.1.1.2 in the below example. It should ignore the other address entirely and it should not show up in any YAMLs or anywhere within Rancher (which it does).

Starting worker node with:

sudo docker run -d --privileged --restart=unless-stopped --net=host \
-v /etc/kubernetes:/etc/kubernetes \
-v /var/run:/var/run rancher/rancher-agent:v2.3.3 \
--server https://10.1.1.1:8443 --address 10.1.1.2 --internal-address 10.1.1.2 \
--token theactualtoken --ca-checksum thcachecksum --worker

It’s been a while but I’m finally back to testing this. My issue is where the traffic is flowing not what IP Rancher is accessible on. When I install something like Longhorn all of the Longhorn traffic is flowing over the 10.0.3.0 network instead of the 10.0.4.0 network. That’s a problem because 10.0.3.0 is on 1GB and 10.0.4.0 is on 10GB making Longhorn perform really slowly. All of this behavior has been confirmed using tcpdump. I’ve submitted https://github.com/rancher/rancher/issues/27109 for the issue.

Also need to note the 10gb network does not have internet so I can’t just run everything over that network.

For anyone else needing to control which adapter the overlay network runs on, here is the config for your RKE file if you are using Canal the default CNI.

network:
  plugin: canal
  options:
    canal_iface: eth1
    canal_flannel_backend_type: vxlan