Etcd failed with host label error

Tarun_Prakash · January 5, 2018, 6:15am

We are using kubernetes catalog to deploy kubernetes in rancher env. Our current ENV is "single rancher server with external database ( aws RDS ).

We use aws autoscaling and nodes automatically added into rancher ENV and we don’t specify any label host manually. Following errors bring down kubernetes dashboard.

Error -

(Failed to allocate instance container:1i199147: Bad instance container:1i199147 in state error: Allocation failed: host must not have a container with label

Right now we solve this error by terminating the node manually on which etcd is failing.

@denise

superseb · January 29, 2018, 11:58am

We highly recommend running plane isolation when running in production. This ensures that you have dedicated hosts for each plane, which greatly improves stability and performance. However, running without should still work and not fail with these errors.

To troubleshoot I’ll have to know more details, how many hosts are you running, when is it triggered (how many hosts are deleted/added). Basically a way to reproduce from scratch would help.

The only known issue at the moment is when you spin up 3 hosts and go back to anything below that, 3 containers will continue to be scheduled: https://github.com/rancher/rancher/issues/10412

Tarun_Prakash · January 29, 2018, 12:14pm

@superseb thanks for helping out on this.

3 nodes are running which can scale up to 5 right now. sometimes on one of these three nodes, we see etcd fails with an above-mentioned error. This is the time, I check the logs on which host etcd failiing to come up. I manually delete that node and new one comes up automatically by aws auto-scaling and i could see etcd deploys successfully.

Let me know if i am able to convey?

Tarun_Prakash · February 2, 2018, 7:42am

@superseb can you please help me understanding planes meaning in rancher. we are looking to setup HA in the rancher. How can that be achieved?

superseb · February 2, 2018, 3:45pm

Probably the question is why does it fail? What is the last logging from the container before it goes down and get rescheduled?

Resiliency planes are recommended for production use and described here: http://rancher.com/docs/rancher/v1.6/en/kubernetes/resiliency-planes/

This will make sure you have dedicated hosts for etcd/orchestration and compute so you can allow one or more node failures.

Topic		Replies	Views
Can't initialize etcd on kubernetes environment (AWS) Rancher 1.x	5	2318	October 1, 2016
Kubernetes environment Rancher 1.x	6	1073	September 6, 2016
Automatic etcd snapshots are failing Rancher	1	854	February 10, 2020
[etcd] Failed to bring up Etcd Plane: etcd cluster is unhealthy: hosts [10.10.34.20] failed to report healthy. Check etcd container logs on each host for more information Rancher	2	4400	October 14, 2022
Failed to bring up Etcd Plane - mkdir /var/lib/etcd: read-only file system Rancher	1	3290	May 13, 2019

Etcd failed with host label error

Related topics