Upgrade from 2.3.5 - to 2.6.3 - Manual Steps?

Hi everyone,

I´m going to upgrade most of our clusters in the nexts days/weeks (Rancher/RKE/Kubernetes).
Are there any known issues upgrading directly from Version 2.3.5 to 2.6.3 ?
This is our biggest Version jump, but we have jumps from 2.4.1 and 2.4.4 too.

Of cours we are going to upgrade RKE and Kubernetes (1.17) to latest version first.
(additional question: is an upgrade from Kubernetes 1.17 to 1.23 is possible without additional steps ?)

I tried to get the most important things extracted from the release notes, but without success.

Thanks a lot
Gerrri

I would not recommend jumping directly from Rancher v2.3.x to v2.6.x as this is an unsupported upgrade path. Officially Rancher only supports jumping one minor version at a time IE v2.3.x to v2.5.x but as someone who is part of the Rancher Support team I would highly encourage you to slowly upgrade Rancher without jumping minor versions IE v2.3.x to v2.4.x then allow some time for burn-in. Then upgrade from v2.4.x to v2.5.x and repeat until you are at v2.6.3. The same also goes for upgrading k8s versions. Rancher and upstream only test and support jumping one minor version IE v1.17.x to v1.19.x.

This is all covered by my Master class which can be found at Kubernetes-Master-Class/rancher-k8s-upgrades at main · mattmattox/Kubernetes-Master-Class · GitHub

1 Like

I was afraid you’d say that.

We are going to do the upgrades step by step now.

Thanks a lot for the Master class Link an the quick reply :slight_smile:

I’m not 100% sure with Rancher, but for other systems I’ve seen recommendations that would lead you to go 2.3.x → 2.3.y where 2.3.y is the latest in 2.3 series and then continue. If you’ve got the time it might not hurt to do that if you aren’t already there.

A follow-up question: why wait 24 hours before doing the next Kubernetes upgrade? isn’t 10min enough here when the cluster is up and running again?

The 24-hour burn-in is a personal preference. The idea is that some issues might not pop up right away. For example, a memory leak in the Kube-apiserver isn’t going to show up right away, or if the k8s upgrade breaks one of Rancher controllers the effect might not show unless users start using it.
But the most important reason is application team smoking is not always the best, let’s say you do the upgrade from k8s v1.16 to v1.18 then right away upgrade again to v1.20. The next morning an application comes to you and says "My app is broken, your upgrade broke it roll it back. The question becomes which upgrade broke it? Was it the v1.18 or the v1.20?

So my rule is a minimum of 24hours for most environments so this gives the environment time to settle down and app teams to notice problems.

Personally, I recommend production and other mission-critical environments where uptime is very important. To schedule all upgrades a week apart. For example, one week I’ll upgrade Rancher one version, then the next week I’ll the upstream cluster then the next week I’ll do the downstream dev clusters, then the next week I’ll do qa, staging, non-mission-critical prod clusters. Finally, the next week I’ll do mission-critical prod clusters. Then 6 months later, the whole process repeats.

DevOps works best when you do lots of small steps instead of a few big jumps.

1 Like

all right. Thanks for the information again :slight_smile: