Rancher + Keepalived

I was doing some research about some options to use rancher with a single (or multiple) external IPs in a HA fashion and found out that some ppl have already spent the time to get keepalived+vrrp running in a container… As is expected, this requires host network and privileged container mode to run… I’m going to make a few tests to see if I can get it running as expected, but in the meantime could I suggest it as a possible addition either as a catalog service, or as a global option of some sort? (it would allow you to setup a “cluster ip” and have a host/LB in the cluster answer to that IP at all times… effectively breaking the need for ELBs, Roundrobin DNSs (and the TTL problem they impose), etc…


or

(there are a few more options too but these are good bases…)

2 Likes

@RVN_BR we use a solution like this already, we’ve in-fact recently made some tweaks to it. I’ll see if i can get the guy who did it to upload something if that would be of use?

1 Like

Yes please! I am also very interested in this!

Hey all,

I’m ‘the guy’. For a number of reasons I use a script as the container entrypoint and handle SIGTERM within it for a clean shutdown. Not doing so seems to cause issues with the IPs not being removed on container stop/restart.

I use Alpine Linux as a minimal base, which hosts Keepalived in their test repo at the moment. I’ve uploaded the Dockerfile I use, the script I’ve mentioned and a basic version of the Keepalived configuration file here.

No issues running multiple containers, on the same host even, as long as the VRID and ‘floating’ IP are different for each instance.

The script in the keepalived.conf file checks for the existence of a port haproxy should be listening on in our specific environment. This can obviously be changed or removed as per your needs.

I always use state BACKUP on all hosts so I can use the same configuration on all hosts and to prevent disruption when a host configured as MASTER comes up but another host is MASTER. An election only occurs if there is no current MASTER. I also don’t allow preempt so when a failover occurs, there’s no automatic failback (until the new MASTER fails of course).

The capabilities required are: CAP_NET_ADMIN and CAP_NET_BROADCAST. Obviously remove the CAP_ if specifying these in a Docker compose file.

Note that GARP is used to notify other hosts on the local network when the floating IP moves between hosts. Linux hosts may ignore these unless this is set:

net.ipv4.conf.all.arp_accept = 1

Firewalls may also refuse GARP updates. I believe a shared MAC address can be used to get around this issue but have not explored this as I’ve had no need.

Happy to take questions.

Hi @sjiveson

Thanks so much for the details and for the example.

I’m a bit unsure about your comments in regards to:

The capabilities required are: CAP_NET_ADMIN and CAP_NET_BROADCAST. Obviously remove the CAP_ if specifying these in a Docker compose file.

Can you please explain better? I dont see any of them in your dockerfile or script in order to add/remove?

I’m checking about the values in the keepalive conf file, a few doubts only…
In your example x.x.x.x is the floating IP correct? And that is expressed both in the haproxy check, as well as in the virtual_ipaddress, right?

The other “variable” NNN I’m a bit unsure? Is the ip addres in the first global block the ip range you are using? In the keepalived config ocumentation I see the parameter mcast_group_4 is optional… and has a default value of 224.0.0.18… is this required? I understand this is the multicast ip it would use? presumably to find more nodes?

Basically my questions revolve around your setup and the following config entires:

global_defs {
   router_id **blahblah**
   vrrp_mcast_group4 224.0.1.**NNN**	
}  

(in this case, should blahblahblah be any value just so it is different from the other router_ids defined later?

and in the other secitions just to make sure:

vrrp_script chk_haproxy {
  script       "ss -ltn 'src x.x.x.x' | grep 443"
}   

(in the above, x.x.x.x is the “virtual ip” correct? so it will be the same as below?)

vrrp_instance lb-vips {
    virtual_router_id NNN
    priority NNN
    ...
    virtual_ipaddress {
        x.x.x.x/24 dev eth0
    }
} 

I assume that each “NNN” is not actually the same value in these, correct? (and again, x.x.x.x is the same as the above x.x.x.x, right?)

I’ll try and make some tests with this soon… have you been running this from within a stack? I assume as a global container? Have you had any issues with the LBs on the hosts listening onto these vips? or are they just binding to 0.0.0.0?

Thanks for your time!!

EDIT: sorry abt that, I was a bit ignorant in regards to putting 2 & 2 together lol… I have added the Capabilities in the docker-compose file… :wink:

Hey, you’re welcome.

I’ve updated the file with some additional comments that answer most of your questions. You may also find it useful to see my ‘beginners’ article on Keepalived here: https://packetpushers.net/vrrp-linux-using-keepalived-2/. Just to quickly cover this for the benefit of others:

  • The mcast_group setting is optional, the default is 224.0.0.18.
  • Router ID can be anything you want, the default is the hostname.
  • The script is very specific to my environment - I’d suggest you do some research to find what might be suitable for you.
  • Router ID is a value between 1-255 - must not clash with others in use.
  • Priority is 1-255 - highest wins.

Yes, I’m running this in a stack. Will be running a pair (or more) per stack, across four hosts.

HAProxy is configured to bind to the VRRP address, which is made possible with the nonlocal_bind kernel setting. Details here: https://itsthe.network/post/TCP%20Nonlocal%20Bind%20Linux%20Kernel%20Settings/.

Hi @sjiveson Thanks for your reply and your initial informartion… it really gave me a “kickoff”…

I think I got a pretty decent setup… I’ll look at your keepalived article as I quite like the topic :slightly_smiling:

I’ll post my final files here soon (I’m currently having some issues with my test install because I was running coreos and the current version stopped supporting coreos… oh my oh my the roller coaster of beta software :p)…

Anyways, I will try and put together a Catalog service (maybe with your help), so we can benefit others… I think this is a much “cleaner” approach to having rancher add/remove IPs from third party DNSs, or even using an ELB… If this works as it should it would be great to cut ELB or whatever external lb tool I was expecting to use :)… One thing I will still want to explore is to be able to have different keepalived configs for different datacetners, for example… but I was able to get a nice setup I believe, with most of the useful stuff coming from Environment variable which can be set in docker-compose or through the catalog from what I gather… its going to caome out nice I think, and solve one of the issues I had with rancher which was kind of bothering me (having either an additional off-rancher Load balancing tier, or suffer with DNS downtime whenever/ifever a host stopped responding…)

Thanks again for your help I’m going to put my final code in a github repo and link here. I’ll also try to make a catalog server once you and/or others comment/contribute to it… IMO it was a “missing link” in rancher which can now be solved cleanly and admined through rancher itself… win-win :smiley:

Hey,

You’re very welcome. Happy to help where I can.

We use confd (https://github.com/kelseyhightower/confd) to populate the IP addresses, VRID and interface name at runtime, via environment variables. This can be extended as necessary for any other variables (like the multicast address or password).

To avoid the risk of ‘clashes’ with other instances on the local network, (trust me it happens) it’s probably wise to use a different password and multicast address per service.

Note I improved the keepalived.sh last night so that once the process is started successfully, it no longer prevents the container failing if keepalived fails after that point, thus allow Rancher to restart it and more importantly, be aware. Previously it silently restarted the process and the failure could only be seen via logs.

Cheers

Cool @sjiveson would you mind sharing the updated keepalive file? I’ll include that in my setup…

What IPs are you populating via confd? Do you have a script for that? I’m not sure I’d use it in our case… I assume you are adding IPs based on services or something? If it would be possible to see how you are doing it it could be interesting to support somehow in the catalog I’d like to submit…

Hey,

I’ve updated the Gist with an even more reliable script file and the confd information and files.

I’m looking into the catalog part of things now.

Cheers

Cool, you beat me to it then :stuck_out_tongue:
Thanks for sharing the update. I’ll try the new script and report if there are any issues or qeustions.

In your example, I understand you are using confd to update the file based on env variables… Unless you are somehow changing the containers env variables, I think a simple sed into the config files during runtime would suffice?

If we were to use confd to setup multiple virtual ips or mutliple frontends I understand the value, but from my reading this isnt possible in your example, is it?

@sjiveson I seem to be having some issues with the new script…

http://pastebin.com/AnnNQqdu

is a container log… it keeps adding those sig msgs forever… and it doesnt bind to the IP (I’m checking whether the OS is at fault here, but iirc the previous script was working… )

I’ll revert back tomorrow if I find something…

Hey,

I remember suffering a similar issue when my check script wasn’t working (hence the increasing and different PID specified with each error). This also explains why you enter the FAULT state. Do you definitely have the ss command available?

BTW - sed - why didn’t I think of that? :slight_smile:

Hi @sjiveson do you mean ss command inside the container, or on the host?

I will check in a little…

Now that I stop to think I think I changed from coreos to ubuntu because the latest rancher isnt working anymore on coreos, so it may just be my fault with a bad OS… i’m having a rough time getting a dev environment consistently running… RancherOS doesnt work at all on vagrant (any container launch brings down the hosts networking as per some reported bugs already), and now with coreos not working its again a cat-and-mouse to get a reliable test env… but I hope I get there lol… one step fwd, 2 backwards sometimes it seems :frowning:

EDIT: apparently 56.1 is working on coreos, so I’m trying that again now…

Hey,

Inside the container.

Bummer. Good luck.

Weird… I’m going to to add an open repo so we can better track this “project” what do you think? I have it on GH right now but in a personal repo which is private… I think I’ll just open it up so more ppl can use/contribute in the future… I can add you as a member on the repo so you can contribute if you give me your GH username, then we can see abt releasing this on catalog etc? what do you think?

In case anyone is interested its now reposted openly here… https://github.com/NeoAssist/docker-keepalived

I included an MIT license as I suppose its the most permissive while affording some protection from liability (lol?)

One thought has come to my mind @sjiveson, I’m going to think about a way of maybe adding a “pool” of HA Virtual IPs. My reasoning is this: if we have 5 hosts, and only 1 IP we are effectively creating a potential bottleneck… Unless you have hosts dedicated to loadbalancing, for example, it may be more effective to have multiple Virtual IPs, and having them land on different servers whenever possible, while still all being on Keepalived so that if 1 or many hosts fail, the IPs continue responding.

It would create a sort of pseudo-anycast availability… (an over-simplification in comparing, but its sort of the behavior it will give…)

EDIT: @sjiveson I have closed the repo temporarily as I realized it is mostly your work and I’d like your OK to put it there?

@sjiveson I have confirmed that ss command is working on both containers, but it never returns anything when called by keepalived…

Removing the chk_script works… with the chk_script both instances are always on FAILED state and the virtual ip is never brought up…

any ideas what coud be causing this? Stopping/restarting a container causes the IP to failover as intended, but it isnt checking the chk_script in this scenario, which would be ideal…

Here is a copy of the command outputted by the keepalived container, and the run on the container itself…

bash-4.3# ss -ltn 'src 172.17.8.150' | grep 80                                  
bash-4.3# ss -ltn 'src 172.17.8.150'                                            
State      Recv-Q Send-Q Local Address:Port               Peer Address:Port     
                                                                                
bash-4.3# ss -ltn 'src 172.17.8.150'                                            
State      Recv-Q Send-Q Local Address:Port               Peer Address:Port     
                                                                                
bash-4.3#  

(FYI, the ip is responding… the hosts have IPs ending 101 and 102, the virtual ip 150 is responding from the outside and I can validate that its working as I installed 2 standalone containers answering on host port 80 with different pages so I can see it failing over)…

Hey,

RE: The git repo, that’s just fine with me.

Regarding the script, I suspect whatever you have listening on port 80 is bound to all interfaces and thus won’t be returned by the ss command as the output would be something like 0::80. Can you change this so the specific IP is used instead? This will require nonlocal_bind to be enabled.

Cheers

Hi @sjiveson, cool, the repo is up for you or anyone else to contribute. I added you as a contributor so you can commit any changes, etc.

You make a good point abt the IP. I’m going to add the binding. Nonetheless, it will be a problem with the Rancher LB, as we need a way of manually binding it to a set of IPs… One thought is to integrate the LB with a Keepalived service. In practice, it makes sense for the keepalived & LB contaiers to be on the same hosts, and listen on the same VIPs.

I’m going to think a little bit about this, and see if I can come up with something interesting…