DNS round robin doesn't work

cabrinoob · January 26, 2017, 7:05am

I’am using rancher 1.1.4.

I have a stack with 2 services: A and B. A is calling B over http through its name (http://B:)

If I scale B 2 or more times, I see that it’s always the same B instance that works. So I first thought that A was calling B through a keep-alive connexion but it’s not the case. A and B are written in nodeJs (Express4 and request npm modules). NodeJS does not cache DNS results.

Can you help me ?

Thank you.

vincent · January 26, 2017, 8:13am

Down the rabbit hole… https://github.com/rancher/rancher/issues/3495

cabrinoob · January 26, 2017, 10:13am

Ok … So, I don’t really understand the conclusion of the linked issue : do we have to implement the loadbalancing logic in the code? Because today I have no loadbalancing between multiple instances of my services. And it seems a little overkill to insert a rancher-lb between each of our services.

To you have an idea or plan to fix this problem?

Thank you

cooperaj · January 26, 2017, 4:28pm

AFAIK DNS round robin has always been a client implementation detail. If you don’t like the way your client handles it you have to work out your own method. Realistically this boils down to loadbalancers.

I could however be entirely wrong.

vincent · January 26, 2017, 4:31pm

The plan to fix it is in that issue, give each service a fixed Virtual IP and let iptables do round robin that works to connect it to one of the backend. Which is essentially a lightweight load balancer. But this is complicated by things like Windows

cabrinoob · January 27, 2017, 7:40am

The fact is the RR load balancing DNS as explained in the documentation was very attractive but for now isn’t functionnal. For our point of view this is quiet disapointing because it hurts a lot the “full decoupling” aspect of our microservices approach.

The agnosticity between the code a the load balancing aspect is very interesting and we would like to keep our code like that.

But it seems that for the moment we have no choice.

vincent · January 27, 2017, 7:47am

I haven’t tried it but there’s a decent chance that switching to Alpine as a base image might “fix” it (and/or maybe cause other issues); Node presumably uses libc/gethostbyname() and Alpine uses µClibc/musl instead of glibc.

cabrinoob · January 27, 2017, 9:01am

My node services are based on the Apline version of nodeJS and it doesn’t work.

BUT, I have some news for you : Working with our IT network specialist (Erwan) we found a way to make the DNS round robin load balancing work.

First, we moved all our containers to the Ubuntu image based of NodeJS, then we disabled the IPV6 of those containers with the docker flag :

--sysctl net.ipv6.conf.all.disable_ipv6=1

Doing that, the load balancing seems to work fine. Here are some getaddrinfo samples :

   1702 IPv4 address: 10.42.101.15 (serviceto.fred.rancher.internal)
   561 IPv4 address: 10.42.145.142 (serviceto.fred.rancher.internal)
   2237 IPv4 address: 10.42.151.230 (serviceto.fred.rancher.internal)
   1732 IPv4 address: 10.42.183.170 (serviceto.fred.rancher.internal)
   231 IPv4 address: 10.42.52.232 (serviceto.fred.rancher.internal)
   1694 IPv4 address: 10.42.77.33 (serviceto.fred.rancher.internal)

We can see that the “activity” is balanced between my 6 instances.

We will go on on our investigations, but is there a way to launch a container through rancher with --sysctl flag

Topic		Replies	Views
DNS Caching Issues Rancher 1.x	19	6430	February 2, 2016
High availability setup with Round Robin DNS Rancher 1.x	4	2125	July 28, 2016
Best practices - Load Balancer Website on Rancher?	6	2212	February 9, 2017
Loadbalancer sporadic 503 with round-robin DNS Rancher 1.x	0	716	November 8, 2017
Help about dns resolving to containers Rancher 1.x	3	951	November 10, 2016

DNS round robin doesn't work

Related topics