Managed Network and Haproxy starting too fast

The context:

I’m using an Haproxy container to serve various small websites. I tried the Rancher Load Balancer: and it worked perfectly but most of my websites need password protection (basic access authentication). And currently it’s not possible to set it up at the Rancher Load Balancer level. That’s why I’m keeping my Haproxy service that I was already using with Rancher 0.28.0
The containers are configured to use the Managed Network.
The Haproxy service is linked to the different websites. So in the haproxy.cfg you have things like that (where myapp1 and myapp2 are the links names):

backend myapp1
   balance roundrobin
   option httpclose
   option forwardfor
   server s1  myapp1

backend myapp2
   acl auth_ok http_auth_group(webaccess) myapp2
   http-request auth if !auth_ok
   balance roundrobin
   option httpclose
   option forwardfor
   server s1 myapp2

My environment:
Rancher v0.37.0
Cattle v0.91.0
UI v0.47.0
Docker 1.8.2

The issue:

It seems that Haproxy is starting too fast, before the links are resolvable, since the container fails with this type of errors:

[ALERT] 258/132043 (7) : parsing [/usr/local/etc/haproxy/haproxy.cfg:110] : 'server s1' : invalid address: ‘myapp1' in ‘myapp1’
[ALERT] 258/132043 (7) : parsing [/usr/local/etc/haproxy/haproxy.cfg:110] : 'server s1' : invalid address: ‘myapp2' in ‘myapp2'
[ALERT] 258/132043 (7) : Error(s) found in configuration file : /usr/local/etc/haproxy/haproxy.cfg
[ALERT] 258/132043 (7) : Fatal errors found in configuration.

Temporary solution:

I could make it work by creating my own haproxy image with this Dockerfile:

 FROM haproxy:1.5
 CMD sleep 5 && haproxy -f /usr/local/etc/haproxy/haproxy.cfg

I’m just overwriting the command. By sleeping 5 seconds, it gives time to the network to be set (I suppose…) and then Haproxy starts with success as it can resolve the hostnames (the links).

This solution is not ideal and I wanted your view on it… Cheers

Additional notes:
I didn’t have this issue with Rancher 0.28.0 but a lot of changes must have been made since then.
The full story is that I tried to upgrade Rancher 0.28.0 because the disk was filling (rancher/server was getting a really big ibdata1 file) but it didn’t work. Rancher wouldn’t start on port 8080. I didn’t see any errors in the logs except the fact that the service on 8080 wasn’t accessible. Anyway, a lot of changes in MySQL must have been made between the 2 versions… I wasn’t so surprised of the failure.

I believe it’s a combination of the /etc/resolv.conf file is written after start (and potentially after you’ve already tried to look up something) and/or the dns server hadn’t received the new configuration with the new names yet.

But what you’re trying to do won’t really work, even if it appeared to… HAProxy will resolve that name to a list of IP addresses once on startup, and then never again for the life of that process. So if a container/host dies and get rescheduled elsewhere, your HAProxy will not know about it and will continue sending requests to the dead IP and not the new one.

You’re right, if a container changes ip, haproxy can’t find it anymore. Haproxy has to be restarted to get the new IP. I didn’t know that. We never noticed the issue as our set-up is simple and we were already restarting haproxy when updating websites.
Which solution would you recommend? I’m thinking of Nginx that should be able to do the job in our case.

If it were available I’d suggest #1871, since we manage the reconfiguration of haproxy to handle that problem. For now yes I believe nginx has an option to respect TTLs when resolving names, and the TTL should be 0 seconds on any record it’s getting from rancher-dns. So that should work.

1 Like