Upgrading Rancher without clusters going off-line

perllaghu · September 7, 2018, 10:19am

I’m using rancher 2.0.6 to manage K8 clusters for services.

Whilst our use is currently in its infancy, I am looking at the long-term management of our rancher service - and that includes upgrades to the rancher service itself.

If I upgrade a standard single-node instance of rancher, then all clusters go off-line [briefly] before coming back up - which means all services running on those clusters become unavailable.

This is a hefty down-time penalty to consider.

I’ve tried a small HA install, and the K8s it controlled also became unavailable.

Are there any plans to look at some form of rolling update to clusters, so services can remain 24/7?

Fraser_Goffin · January 24, 2019, 12:54am

First, if you want to avoid an outage then you really have to go with a HA setup. For a variety of compliance reasons, we refresh our entire set of clusters regularly, so automating this process is absolutely critical. We have found that we can cycle our worker nodes and the pods are successfully relocated to other healthy nodes as part of a rolling update. There is no no loss of service and, depending on how aggressively you want to replace nodes and your cost considerations, you can also mitigate performance degradation. Same is the case for Control Plane nodes. Etcd nodes are a slightly different proposition and we are working on that one right now. Of course the architecture for Rancher in this configuration means that your management plane could suffer a loss of service but this shouldn’t impact your application workloads (although until v2.2 … I think) RBAC has a dependency on your HA cluster, so that might be an issue depending on how you have that configured. So you might only lose the ability to manage your workloads via the UI. You can still do so via Kubectl though, or via you CI/CD pipelines (preferred).

Topic		Replies	Views
Need for Rancher HA / Impact if down? Rancher	13	5285	June 14, 2018
Best way for zero downtime during Rancher version upgrades (with Cattle) Rancher 1.x	0	1353	March 3, 2017
Rancher HA setup Rancher	5	1007	May 17, 2019
Kubernets cluster upgrade downtime Rancher	0	502	December 7, 2018
What happens to the clusters if the Rancher instance goes down? Rancher	8	3598	July 8, 2021

Upgrading Rancher without clusters going off-line

Related topics