Node OS Maintenance

After provisioning of a Kubernetes cluster f.e. using rke on any IaaS or using a cloud Kubernetes offering like EKS, one needs to take care of the nodes. They especially need patching and rebooting from time to time (depending on what updates have been made).

How do you do that, in an automated fashion? I discovered three tools, but they each have shortcomings:

How are other doing this maintanence work (day-2 operations) of nodes?

Hi, Kured work with RancherOS?

I found thie issue:
https://github.com/weaveworks/kured/issues/65

I’ve done it manually and with a script. The script does a kubectl get node to seperate the controlplanes from the workers. It then drains the cp’s one at a time and does the updates/reboots. Once complete it moves to the worker nodes. I have noticed with newer rancher 2.4.4+ custom clusters (rke) with k8s 1.17.x and new kubectl that we do not have to add the ignores for local data.

I guess it also depends on how you provision your cluster. We build out our hosts with docker using ansible then add them to the cluster. We also use OL 7. We are at 7 update 8 at this time and we use the UEKR5 kernel. Since Oracle has ksplice available we don’t have to reboot nodes that often. In one recent update docker was updated. As part or the patching the docker service does down and come back up. Thankfully, we have not seen many issues.

My best suggestion it to set up a test cluster and test automating the upgrades. And even though we do use scripts to automate it, I still prefer updating serially and doing checks.