Rancher server crashes as soon as the worker node goes down

Hi everyone!

I have followed the HA Installation tutorial, step by step and to the comma, and succesfully deployed Rancher 2.x in a bare-metal environment. I have set 2 nodes:

  • The master, with only ectd and Control Plane
  • A worker node

everything works fine and I can deploy workloads with no issue, but I tried to restart the worker node to see how well Rancher recognized that and, to my surprise, the web server was not accessible. I now know that it is because the deployment of many components of rancher (if not all) are in the worker node.

So I have a few questions:

  • Is there a way for me to deploy the web server without any worker node and add it later in te UI?
  • Can I start a master node with all three options (etcd, Control Plane and Worker) and later remove the worker option, after I have added a worker node to substitute it?

If you have any other ideas, please let me know, I am very eager to find a solution to my problem.
Thanks in advance!

Throughout the whole section of HA installation, it states to use 3 nodes with all roles. This is the recommended way of installing, to eliminate any surprises or odd behavior. Changes to this model are not recommended.

Thank you so much for answering.
I’ve read, however (I don’t really know where, or when it was written), that master nodes should not use the worker role, because it has some interference with etcd… Is this any true at all?
Thanks in advance!

The RKE cluster for HA is described here: https://rancher.com/docs/rancher/v2.x/en/cluster-provisioning/production/#rke-cluster-running-rancher-ha