On-Prem HA K3S Experience and Issues

I would like to use this thread to document and ask questions about High-Availability On-Prem with K3S.

By “HA” I am only referring to our own services/ingress - not the K8S API/Control Plane. Our real focus is 100% service availability if a node is rebooted. Real node failure is almost non-existant and out of scope here.

Plan A

Our initial plan was to use 2 nodes (master+worker) with SQLite and an external load balancer and achieve HA by running all our deployments as DaemonSets. This failed since when either the master or the worker goes down the endpoints are not updated and every 2nd request is routed to a non-existent pod. Documented here.

Plan B

Here we tried to use 2 master nodes with embedded etcd and external load balancer - assuming that when one master disappeared the other would be fully functional. That dream failed in similar fashion. When one master dies the other refuses to serve the K8S API - in other words having 2 masters is the same as having one: when one fails the other one is incapacitated. Why this is, is a mystery to me. I understand Raft needs a majority but 1 is a majority when the other node is dead. I suppose the whole thing relates to the 1 remaining node not knowing if the other node is down or if a network split occurred and thus erring on the side of doing nothing.

Plan C

Here we just tried 2 separate Kubernetes clusters with an external load balancer. Yes, you have to deploy your applications to 2 clusters but you get HA with only 2 nodes.

Plan D

Here we plan to try 2 masters with an external database and external load balancer.

Plan E

Finally, we plan to try the recommended way of running 3 masters with embedded etcd.

So, the first question we have is about the embedded etcd option described here.

  1. We are seeing 2 nodes in the cluster both with Role control-plane,etcd,master but we cannot see any etcd* pods.
  • How can we verify etcd is running?
  • Our k3s service is running with `/usr/local/bin/k3s server --config /etc/myb/k3s/config.yaml’
    • Should we have --cluster-init in there? On both nodes?
  1. Is cluster-init a one-time flag or does it need to be in the systemd config on all nodes?

Interested to know the outcome of your investigation. What did you end up doing for HA?