Troubleshooting Controller Manager and Scheduler Unhealthy Issue

ralic · February 21, 2019, 8:34am

I’m new to rancher and I would really appreciate someones help on troubleshooting and fixing “Controller Manager Unhealthy Issue”.

We have a problem with controller-manager and scheduler not responding, that is not related to github issues we found (rancher#11496, azure#173, …)

Yesterday we had memory overflow by one POD on one Node in our 3-node HA cluster. After that rancher webapp was not accessible, we found the compromised pod and scaled it to 0 over kubectl. But that took some time, figuring everything out.
Since then rancher webapp is working properly, but there are continuous alerts from controller-manager and scheduler not working. Alerts are not consist, sometimes they are both working, some times their health check urls are refusing connection.

NAME                 STATUS      MESSAGE                                                                                     ERROR
controller-manager   Unhealthy   Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused
scheduler            Healthy     ok                                                                                     
etcd-0               Healthy     {"health": "true"}                                                                     
etcd-2               Healthy     {"health": "true"}                                                                     
etcd-1               Healthy     {"health": "true"}

Restarting controller-manager and scheduler on compromised Node hasn’t been effective. Even reloading all of the components with
docker restart kube-apiserver kubelet kube-controller-manager kube-scheduler kube-proxy wasn’t effective either.

Can someone please help me figure out the steps towards troubleshooting and fixing this issue without downtime on running containers?

Nodes are hosted on DigitalOcean on servers with 4 Cores and 8GB of RAM each (Ubuntu 16, Docker 17.03.3).

Thanks in advance !

Herman_Milton_Maleia · June 12, 2023, 7:50am

Hi @ralic
Hope you are doing well.
Did you manage to fix this issue?

Topic		Replies	Views
Alert: Component scheduler is unhealthy Rancher	2	1957	April 1, 2021
Kubectl get componentstatus fails for scheduler and controller-manager Rancher	9	14983	April 9, 2021
About rancher auto deploy k8s not working! Rancher 1.x	2	1172	July 18, 2016
Schedular and controller restarts frequently - rancher rke Rancher	3	1215	December 7, 2020
Cluster turns unhealthy after a few hours Rancher	0	575	September 9, 2020

Troubleshooting Controller Manager and Scheduler Unhealthy Issue

Related topics