Schedular and controller restarts frequently - rancher rke

syed_saifulla · December 7, 2020, 11:03am

Hi team,
After the deployment of RKE cluster using the YAML file cluster was up and running for 3-4 days then later we found scheduler and controller is frequently restarting due to failure to renew lease kube-system/kube-scheduler: failed to tryacquireorrenew context deadline exceeded

Troubleshoot:

restarted the exited master components, docker service and redeployed the rke cluster
and many more but no changes on scheduler and controller
Nodes are healthy with resources

@rancher_admin @superseb @mathieu-gilloots
Request you to please look into this bug as to why this scheduler and controller is restarting
more often because this impacts the production business.

superseb · December 7, 2020, 1:57pm

Please share more info about the setup, this is usually caused because the nodes are running out of resources. Are the nodes from the screenshot the only nodes that are in the cluster? It is recommended t o have at least 3 etcd nodes and 2 controlplane nodes to make sure it remains available when one of those go down. Please share specifications of the nodes (host OS/Docker version/CPU/Memory/Disk type+IOPS), and the exact logging from when it happens. Also in this case, please share the logging of the etcd container.

syed_saifulla · December 7, 2020, 2:49pm

Added 2 more control plane but occurs same issue in the other nodes as well such as restarting the containers of scheduler and controller at every 1-5 minutes.

Regarding resources, all are healthy with ram, cpu cores and disk.
OS: Red Hat Enterprise Linux Server release 7.5 (Maipo)
disk: lvm

superseb · December 7, 2020, 4:30pm

Please supply the requested info so we can diagnose the issue, there is something wrong but saying everything is fine will not help in diagnosing the issue. An LVM disk also does not say anything about the requested info, etcd has an IO requirement which will show in the logging if it has issues there, thats why we need the specifications of the machines and the logging so they can be checked.

There also seems to be an EL8 node in the cluster while this was only supported starting with k8s 1.19.

Topic		Replies	Views
Restarting Controller Manager and Scheduler Rancher	2	6968	March 1, 2019
Rancher UI goes down inconsistency , I'm following single node rancher deployment	0	35	July 5, 2024
Rancher Cluster Issue Rancher	1	3235	June 10, 2022
Troubleshooting Controller Manager and Scheduler Unhealthy Issue Rancher	1	3699	June 12, 2023
Rancher/rancher facing issue and continuously restarting	1	59	July 3, 2024

Schedular and controller restarts frequently - rancher rke

Related topics