Hi,
After implementing healthchecks on most of our services we are having issues upgrading services. We typically run a stack with 10 - 12 services each. One stack represents what we earlier used to call a application. Usually 2-3 backend services, 2-3 frontend services and load balancers between them. Failures during upgrade usually happens to the frontend services that are linked to a backend service. The healthcheck running on frontend also check that the backend application is responding as it should. If not - the front end service is recreated.
During upgrades the backend services are upgraded first. Trying to upgrade the frontend service often fails with the following error message:
ERRO[0049] Failed to start: frontend-portal : Service frontend-portal must be state=active or inactive to upgrade, currently: state=updating-active
This is probably because the frontend-service has detected that the backend was upgraded and is now reinitalizing due to change IP (or what ever) on the backend service.
Upgrades are done automatically from our CI-system using the rancher binary:
rancher up --upgrade -c --interval 10000 --batch-size=1
Our healthcheck typically looks like this:
frontend-portal:
scale: 2
health_check:
port: 80
interval: 2000
request_line: ‘GET “/” “HTTP/1.1\r\nHost: bla\r\nUser-Agent: healthcheck”’
unhealthy_threshold: 3
healthy_threshold: 1
response_timeout: 2000
initializing_timeout: 20000
reinitializing_timeout: 20000
So what is the correct way of upgrading services when healtchecks are configured? Is there a way to disable healthchecks during upgrades or maybe ignore the state “state=updating-active” and upgrade anyhow?
Best regards,
Alexander