Rescheduling PODs after RKE worker node failure in less than 5 minutes

Hello all,

I reconfigured my RKE Cluster on Rancher 2.4 in order to get PODs rescheduled in less than 5 minutes after a node failure (testing it by shutting down the worker node). However it does not work, it reschedules the PODs after the default 300 seconds.

I followed Superseb indications in order to change this default behaviour:

This is the configuration I set up for my cluster:

  always_pull_images: false
    default-not-ready-toleration-seconds: '30'
    default-unreachable-toleration-seconds: '30'
  pod_security_policy: false
  service_node_port_range: 30000-32767
    node-monitor-grace-period: 16s
    node-monitor-period: 2s
    pod-eviction-timeout: 30s
    node-status-update-frequency: 4s
  fail_swap_on: false
  generate_serving_certificate: false
kubeproxy: {}
scheduler: {}

What am I missing?

Best Regards,

This is known issue with upstream kubernetes

I did create a workaround for this issue


After doing some further testing, the configuration I specified is actually working.
Maybe it took some time to apply the changes.

Thank you anyway!

How much time passed before it started working? I’m experiencing the same thing.

I think I tried it the next day and then I realized it was actually working.