Kubernetes faster node failure detection

rgbj · January 18, 2017, 1:00pm

Hi,

I’d like to get kubernetes to detect failed nodes and recover faster by using the parameters outlined here: Improving Kubernetes reliability: quicker detection of a Node down | Fatal failure. Namely, the article says:

kubelet: node-status-update-frequency=4s (from 10s)

controller-manager: node-monitor-period=2s (from 5s)
controller-manager: node-monitor-grace-period=16s (from 40s)
controller-manager: pod-eviction-timeout=30s (from 5m)

Is there a way with a rancher-based kubernetes installation to pass these parameters to the relevant components?

TIA for any help

laurent · January 29, 2017, 10:12am

Hi,

You can create your own template to do this or modify the existing.

Please have a look to the docker-compose.yaml file you will get it

controller-manager:
command:
- kube-controller-manager
- --master=https://kubernetes.kubernetes.rancher.internal:6443
- --cloud-provider=${CLOUD_PROVIDER}
- --address=0.0.0.0
- --kubeconfig=/etc/kubernetes/ssl/kubeconfig
- --root-ca-file=/etc/kubernetes/ssl/ca.pem
- --service-account-private-key-file=/etc/kubernetes/ssl/key.pem
image: rancher/k8s:v1.5.1-rancher1-7
labels:
io.rancher.scheduler.affinity:host_label_soft: orchestration=true
io.rancher.container.create_agent: "true"
io.rancher.container.agent.role: environmentAdmin
links:
- kubernetes

Laurent

rgbj · February 13, 2017, 4:46pm

Thanks for your answer, I now see where I should look.

Topic		Replies	Views
Cattle-pods failing Rancher	2	1772	October 25, 2019
Rescheduling PODs after RKE worker node failure in less than 5 minutes Rancher	4	2471	September 15, 2020
Kubelet stopped posting node status Rancher	5	8794	February 27, 2023
Rancher Kubernetes cluster doesn't detect the node failure	1	1062	August 3, 2016
Altering RKE defaults to pass parameters to kubelet Rancher	4	864	July 25, 2018

Kubernetes faster node failure detection

Related topics