We are looking into several options in my job as for which K8s manager to use, and one of the important topics is how to maintain the system up to date.
So could anyone explain me how often is a new (stable) release created, and how painful would be to upgrade (or even if it isn’t necessary)?
Hi, I am using rancher since about two years. I started with Rancher 1.6 and migrated to Rancher 2.x.
I update Rancher every 3 months without any big problem.
I realy recromend to use RKE to setup your Rancher Cluster. With this CLI tool, its realy easy to setup a HA cluster, and update Kubernetes. With Helm you can easily update and rollback Rancher.
One time something went wrong with an update via Helm, a rollback via Helm was enough to return to the working env before the update and just skip the version.
I hope this short message will help you. I am happy to answer any further questions you may have.
Thank you very much for your input in the topic.
I have a few more questions yet, though…
1- If I were to skip a version and then try to update to the version after that (Imagine I am in a hypothetical v1, skip v2 and want to then upgrade to v3), would I have to install the version I skipped (v2) before upgrading to the one I want (v3)?
2- Is there a way to update the cluster with no downtime, assuming a HA installation?
3- What about only upgrading specific nodes, is that feasible?
Sorry if that’s a lot of questions, I just happened to encounter with new scenarios since the original question was posted. Thank you so much for your help!
I’m glad to help you. I try to answer your questions as well as possible.
No, it is not necessary to install the version you want to skip before installing the newer one. I did that twice with Major Updates (for example, 2.2.4 -> 2.2.6). Please also note the release notes of the versions. It may happen that additional tasks are necessary for updating. However, I would be careful for skipping larger updates (for example, 2.1.x -> 2.3.x). This has not been necessary for my clusters, so I can not speak from experience.
Updating Rancher via Helm is not done on all nodes at the same time, so there are always enough instances ready to receive traffic. The situation is similar for an update of the cluster via RKE. To update Kubernetes it is often not necessary to restart all pods. Most of the components of Kubernetes are restarted one after the other. The most important of them are also executed redundantly (e.g., etcd).
I assume you mean to update nodes on system-level to update e.g. Docker and other components of the Linux host. Before you perform maintenance on a node, you should use the “drain” function for the node. This will propagate existing pods to other nodes and will not initialize any new pods on the node.
I hope these answers help you. My English is not the best.
We have been using Rancher for over two years now. We have gotten amazing levels of support from a team who knows what they are doing. I can’t recommend Rancher enough.
As far as your other questions here are my answer:
1 - As far as skipping versions it’s not just a Rancher question. As long as the underlying versions of your OS, Docker and Kubernetes are supported there should be no problems.
For minor version numbers you should have no problems.(2.1-2.4 for instance)
Major versions are just that Major changes to the environment and will likely have breaking changes as part of the update. For instance, the change from 1.x to 2.x was a change from the Cattle orchestrator to Kubernetes. So complete re-writes of your devops automation and configurations were needed.
Also when you upgrade any of the Management products you will normally also be updating the OS, Docker/Containerd, and Kubernetes. OS and Docker/Containerd Upgrades are not part of the management platform. Rancher and Kubernetes upgrades are part of both RKE and Rancher UI/CLI built clusters. So far, as long as we have stayed under supported releases we have had no issues with any of the upgrades.
We update regularly in our HA Environment with no outages of applications running on the platform. We do schedule the upgrades and ask our Developers and others not to push patches or updates during our window. We do this just to make sure we don’t end up in a weird state because something was in the middle of updating when another action was trying to happen.
Upgrading specific nodes depends on what you are talking about. We do rolling updates for the OS and Docker. So yes you can update at that level on specific nodes. Kubernetes is a system and while you are technically updating specific pieces on the nodes separately you wouldn’t want to stay in that state for a long period of time. If you have teams who need different versions of K8s or OS or something I would suggest setting up separate clusters for them. I was just at kubecon last week and this seemed to be the recommendation from most of the presenters who talked about things like this.
Let me know if you have other questions.
Thanks,
Brian
Agree with most that has been said. Unless your business workloads have a high tolerance for potential outage (or you can arrange an upgrade window with all you business application owners) then of course for upgrades of all types you need to exercise some caution. If’s quite easy to be seduced by statements like K8s is completely self-healing, rolling upgrades are completely safe and fool-proof, upgrading individual components should be ok. IME all of those are highly situation dependent. Upgrading Rancher minor version is mostly trouble-free, but there are always a few edge cases, many of which have nothing to do with Rancher itself. For example, upgrades can create incompatibilities with the external applications that integrate with Rancher and/or K8s … think logging/monitoring/alerting, security. products (how do you manage your image and container run-time policies, OPA, Twistlock, Aqua ?), operators or CRDs that you have created, your deployment software (we use Terraform and the Rancher2 provider and have notes several areas of incompatibility or upgrade lag), Helm, … what about you Etcd backups, are they compatible ? … and so on. On the whole Rancher provides some protection from many of these things, especially if you follow the recipes that they publish and (up until v2.3) upgraded Rancher server, K8s and RKE as a unit. v2.3 allows independent upgrade of compatible K8s versions and that is certainly welcome as a way of reducing risk from what is otherwise somewhat of an all-or-nothing approach. If you have decided to make use of non-Rancher specific models (and many do want to do that or their company policy may mandate it), then again you have to reconcile and properly understand any ripple effects.
Don’t get me wrong, we licence Rancher as an Enterprise support customer and naturally we want to leverage the platform as much as we can. But at the same time we are always cautious about patching and upgrades, it’s just inherently risky to do, … and risky not to. Our expectations are based in the reality that no tech stack has been designed specifically for our environment and, our business continuity model, and our internal CISO and engineering policy. So long as you are aware where all those are, you will know how much you can lean on a vendor provided process and where you can’t.
We run regular ‘drills’ for DR and have a number of platform test clusters that we use to test changes to the platform such as upgrades before we make a decision about rolling upgrades and patches to higher environments. For some that will be completely OTT, … for our internal change management process for Prod, it’s mandatory
We recycle all of our clusters, including the Rancher management plane (in HA mode), daily. The more you practice the stuff that makes you anxious, the more comfortable you get with it.