Running Rancher Server in HA

I’ve come across a lot of topics on Rancher 1.x in HA, but it seems to me that Rancher 2.0 does not have the ability to use an external etcd cluster or attach any volumes to store management level etcd data. If we go into prouduction with Rancher 2.x and lose the one docker container running Rancher server we don’t have much recourse to recover.

I’m assuming you’ve read the docs here?
https://rancher.com/docs/rancher/v2.x/en/installation/ha/
Those instructions use Rancher’s excellent rke tool to create a fairly vanilla HA Kubernetes cluster including three etcd nodes in three fault zones.
Once you have Rancher 2 up, there are settings to have the etcd nodes create snapshots which you can back up.
We experimented with all of this but decided to go a different route and just run a single Rancher node where we map /var/lib/rancher to the external host and back that up. DR is as simple as spinning back up your host, copying the data back over and running the docker image. We’ve done DR drills with it and it’s as easy as can be. Keep in mind, we have this running on a VMware node that will fail over to our other facility automatically, so that is our HA in most ways that really matter for this. If Rancher is down for a minute or two (usually much less) the controlled clusters are not effected, they continue to do their work.
For reference, here is the command we use to run Rancher 2 in production
sudo docker run -d \
--restart=unless-stopped \
--name rancher \
--log-driver json- file \
--log-opt max-size=100m \
--log-opt max- file =3 \
-p 80:80 -p 443:443 \
-e HTTP_PROXY= "http://wall.ad.selinc.com:8080/" \
-e HTTPS_PROXY= "http://wall.ad.selinc.com:8080/" \
-e NO_PROXY= "localhost,127.0.0.1,0.0.0.0,localaddress,selinc.com,10.0.0.0/8" \
- v /var/lib/rancher : /var/lib/rancher \
- v /etc/rancher/ssl/cert .pem: /etc/rancher/ssl/cert .pem \
- v /etc/rancher/ssl/key .pem: /etc/rancher/ssl/key .pem \
sel-docker.artifactory.metro.ad.selinc.com /rancher/rancher :v2.0.8-rc1 --no-cacerts

Note that we are also mapping over our ssl cert and key

1 Like

I have, but it just explains RKE HA, then a helm install of rancher. Nowhere does it describe configuring Rancher server to talk to another, external, ETCD.

I’m sure RKE is great, but we’re not using it for, reasons, so Rancher server needs to be deployed unto our existing Kubernetes. I have an etcd cluster running but after breaking into the Rancher container and poking around none of the configuration options from rancher --help show what args need to be passed.

OK, I think I’m starting to understand . . maybe.
So what you are saying is that you want to run an instance of Rancher inside of an existing Kubernetes cluster. Is that correct?
If so, just deploy it like you would any other workload. The docker run I posted above has most of the items you’d probably care about.
For reference, you could deploy a test version of Rancher HA, letting RKE do its thing, then take a look at what you end up with.
Note: Last I heard from Rancher support, the Rancher pod itself does not support running more than one copy at a time, it’s a known bug they are looking into. Not a big deal I don’t think as Rancher stores its config in the etcd db of the cluster it is ran in and that data is then replicated to all of your nodes. Again, take some of this with a grain of salt as its based off the playing we did when experimenting with Rancher HA and working with Rancher support.

Once you have Rancher running in your existing cluster, I believe you then just import that cluster into
Rancher. Because Rancher / RKE did not create the cluster, it won’t be able to upgrade / maintain it, but you will be able to do everything else with it.

Hope that helps.