Etcd failed with host label error

We are using kubernetes catalog to deploy kubernetes in rancher env. Our current ENV is "single rancher server with external database ( aws RDS ).

We use aws autoscaling and nodes automatically added into rancher ENV and we don’t specify any label host manually. Following errors bring down kubernetes dashboard.

Error -

(Failed to allocate instance container:1i199147: Bad instance container:1i199147 in state error: Allocation failed: host must not have a container with label

Right now we solve this error by terminating the node manually on which etcd is failing.

@denise

We highly recommend running plane isolation when running in production. This ensures that you have dedicated hosts for each plane, which greatly improves stability and performance. However, running without should still work and not fail with these errors.

To troubleshoot I’ll have to know more details, how many hosts are you running, when is it triggered (how many hosts are deleted/added). Basically a way to reproduce from scratch would help.

The only known issue at the moment is when you spin up 3 hosts and go back to anything below that, 3 containers will continue to be scheduled: https://github.com/rancher/rancher/issues/10412

@superseb thanks for helping out on this.

3 nodes are running which can scale up to 5 right now. sometimes on one of these three nodes, we see etcd fails with an above-mentioned error. This is the time, I check the logs on which host etcd failiing to come up. I manually delete that node and new one comes up automatically by aws auto-scaling and i could see etcd deploys successfully.

Let me know if i am able to convey?

@superseb can you please help me understanding planes meaning in rancher. we are looking to setup HA in the rancher. How can that be achieved?

Probably the question is why does it fail? What is the last logging from the container before it goes down and get rescheduled?

Resiliency planes are recommended for production use and described here: http://rancher.com/docs/rancher/v1.6/en/kubernetes/resiliency-planes/

This will make sure you have dedicated hosts for etcd/orchestration and compute so you can allow one or more node failures.