Kubernetes faster node failure detection

Hi,

I’d like to get kubernetes to detect failed nodes and recover faster by using the parameters outlined here: Improving Kubernetes reliability: quicker detection of a Node down | Fatal failure. Namely, the article says:

kubelet: node-status-update-frequency=4s (from 10s)

controller-manager: node-monitor-period=2s (from 5s)
controller-manager: node-monitor-grace-period=16s (from 40s)
controller-manager: pod-eviction-timeout=30s (from 5m)

Is there a way with a rancher-based kubernetes installation to pass these parameters to the relevant components?

TIA for any help

1 Like

Hi,

You can create your own template to do this or modify the existing.

Please have a look to the docker-compose.yaml file you will get it :wink:

controller-manager:
command:
- kube-controller-manager
- --master=https://kubernetes.kubernetes.rancher.internal:6443
- --cloud-provider=${CLOUD_PROVIDER}
- --address=0.0.0.0
- --kubeconfig=/etc/kubernetes/ssl/kubeconfig
- --root-ca-file=/etc/kubernetes/ssl/ca.pem
- --service-account-private-key-file=/etc/kubernetes/ssl/key.pem
image: rancher/k8s:v1.5.1-rancher1-7
labels:
io.rancher.scheduler.affinity:host_label_soft: orchestration=true
io.rancher.container.create_agent: "true"
io.rancher.container.agent.role: environmentAdmin
links:
- kubernetes

Laurent

Thanks for your answer, I now see where I should look.