DNS round robin doesn't work

I’am using rancher 1.1.4.

I have a stack with 2 services: A and B. A is calling B over http through its name (http://B:)

If I scale B 2 or more times, I see that it’s always the same B instance that works. So I first thought that A was calling B through a keep-alive connexion but it’s not the case. A and B are written in nodeJs (Express4 and request npm modules). NodeJS does not cache DNS results.

Can you help me ?

Thank you.

Down the rabbit hole… https://github.com/rancher/rancher/issues/3495

Ok … So, I don’t really understand the conclusion of the linked issue : do we have to implement the loadbalancing logic in the code? Because today I have no loadbalancing between multiple instances of my services. And it seems a little overkill to insert a rancher-lb between each of our services.

To you have an idea or plan to fix this problem?

Thank you

AFAIK DNS round robin has always been a client implementation detail. If you don’t like the way your client handles it you have to work out your own method. Realistically this boils down to loadbalancers.

I could however be entirely wrong.

The plan to fix it is in that issue, give each service a fixed Virtual IP and let iptables do round robin that works to connect it to one of the backend. Which is essentially a lightweight load balancer. But this is complicated by things like Windows

The fact is the RR load balancing DNS as explained in the documentation was very attractive but for now isn’t functionnal. For our point of view this is quiet disapointing because it hurts a lot the “full decoupling” aspect of our microservices approach.

The agnosticity between the code a the load balancing aspect is very interesting and we would like to keep our code like that.

But it seems that for the moment we have no choice.

I haven’t tried it but there’s a decent chance that switching to Alpine as a base image might “fix” it (and/or maybe cause other issues); Node presumably uses libc/gethostbyname() and Alpine uses µClibc/musl instead of glibc.

My node services are based on the Apline version of nodeJS and it doesn’t work.

BUT, I have some news for you : Working with our IT network specialist (Erwan) we found a way to make the DNS round robin load balancing work.

First, we moved all our containers to the Ubuntu image based of NodeJS, then we disabled the IPV6 of those containers with the docker flag :

--sysctl net.ipv6.conf.all.disable_ipv6=1

Doing that, the load balancing seems to work fine. Here are some getaddrinfo samples :

   1702 IPv4 address: 10.42.101.15 (serviceto.fred.rancher.internal)
   561 IPv4 address: 10.42.145.142 (serviceto.fred.rancher.internal)
   2237 IPv4 address: 10.42.151.230 (serviceto.fred.rancher.internal)
   1732 IPv4 address: 10.42.183.170 (serviceto.fred.rancher.internal)
   231 IPv4 address: 10.42.52.232 (serviceto.fred.rancher.internal)
   1694 IPv4 address: 10.42.77.33 (serviceto.fred.rancher.internal)

We can see that the “activity” is balanced between my 6 instances.

We will go on on our investigations, but is there a way to launch a container through rancher with --sysctl flag