Auto-Restart and reconnecting host

Dear all,

We facing an issue with Rancher. I created a service with auto restart in a two hosts environment. When I shoot one VM, in infrastructure view I see a “reconnecting…” state, my service is not available but the stack status is still “active”. Is there a timeout settings for the host ? Even if I deactivate the host.

I understand there is an option “health check” but how to perform the health check with a daemon without tcp/udp connection ? (ex : ntp)

The behavior that you described is what is expected for a reconnecting host.

A couple of notes:

  1. We don’t automatically delete any hosts from the Rancher setup. If they are in reconnecting, we are expecting our users to either fix or remove the host from the UI.
  2. When a host is inactive state, none of the services/containers on the host will be moved off the host unless there is a health check. Rancher has no knowledge of whether or not this container might still be running on the host, but it’s just not connected to Rancher.
  3. If you have no health check, the only way to move the container to a different host is to delete it.

Currently, we don’t have support for health checks for what you’re asking.

Thanks for your response @denise .

The most disturbing for me is the state of the stack which stay in active/green state. Maybe a “Unknown” state will be more appropriate ?

I developped a script to connect to the rancher API to get the status of a stack. In this case I’m never noticed of a problem.

Technically, there is nothing wrong with the stack. Rancher doesn’t know the state of that container and assumes it is healthy. The reconnecting state of a host doesn’t indicate that the container is still not being up and running. It just indicates the connection between rancher/agent container and the rancher/server container failed.

Without a health check, Rancher does not know the state of the container and would not be able to report that something is unhealthy with the state of the service.