While evaluating rancher I noticed that pods which lived on a just deleted node, does not get redeployed on another, available worker node. I needed to hit redeploy in the UI, which fixed the problem.
- Are there any default timeout settings in the cluster, before rescheduling happens and where/how to tweak them?
- The tested pod, which lived on the deleted node, is marked as “Active” in the UI. Is there anything what can be done to tweak the health check to reflect a more realistic state (from end users perspective)?
Setup: 3 worker nodes, 1 etcd node (do)
Rancher version v2.2.4
Pod: bare nginx
Are you running a Pod, or a Deployment?
Per the Pod documentation:
Pods do not, by themselves, self-heal. If a Pod is scheduled to a Node that fails, or if the scheduling operation itself fails, the Pod is deleted; likewise, a Pod won’t survive an eviction due to a lack of resources or Node maintenance. Kubernetes uses a higher-level abstraction, called a Controller, that handles the work of managing the relatively disposable Pod instances. Thus, while it is possible to use Pod directly, it’s far more common in Kubernetes to manage your pods using a Controller. See Pods and Controllers for more information on how Kubernetes uses Controllers to implement Pod scaling and healing.