Upgrade services that require quorums

tobowers · May 11, 2016, 9:16pm

We are trying to upgrade an rabbit cluster that is built as a single service in Rancher (1.0.1). We have an appropriate health check in place and are using batch size of 1 and an “in place” upgrade.

The behavior we’d like/expect to see is that each container gets a healthy replacement before continuing.

The behavior we are seeing is that only a single container in healthy is kept around, and while containers are in the “initializing” state, the upgrade continues on. Therefore, we lose quorum as the cluster goes down to one node during upgrade.

tobowers · May 12, 2016, 1:54pm

I spoke to @vincent on IRC yesterday and apparently upgrade does not support health checks… only interval and batchSize

andyshinn · May 12, 2016, 9:34pm

I’d love for the ability to do batch upgrades based on health status + timeout rather than an interval. I think i’ve raised this before. But I’m wondering if this solves both the quorum and “zero-downtime” deploy idea. I’ve been noodling on how I might like my upgrade procedure to go. Something like:

Start an upgrade with batch size of 2 and timeout of 300 seconds.
Start two new containers.
If timeout hit go to next batch (or maybe fail upgrade?)
Once batch is healthy, proceed.
Continue batch (stop containers from first batch and start new batch).

denise · May 15, 2016, 10:30pm

@andyshinn I don’t believe there is a Github issue for this enhancement, so please feel free to make one.

vincent · May 16, 2016, 6:06am

As @topper SA I’d it doesn’t exist today, but I basically agree…

I would definitely fail/stop if one didn’t come up, given that you’re trying to maintain quorum.
And service healthchecks already have an initializing timeout so I would just use that one rather than having as l separate batch timeout…

pkrizak · April 13, 2018, 9:11pm

This thread is a couple years old. Any changes in the status of this feature since then? I have a very similar requirement to OP (service that requires quorum) and so being able to sequence upgrades based on health check status would be a huge win.

As it stands, I’m stuck with a rather crappy option of using a huge --interval or huge initializing timeout, which makes upgrades needlessly slow, and also still doesn’t guarantee quorum, as the upgrade could fail and Rancher will still happily shutdown/restart all the remaining containers even if the first one fails to come up.

Topic		Replies	Views
Rancher Catalog and upgrade_strategy Rancher 1.x	0	920	September 19, 2016
Service Upgrade - stops too fast Rancher 1.x	4	1513	January 15, 2016
Healthchecks causing upgrades to fail Rancher 1.x	0	794	August 22, 2017
Does Rancher require quorum for HA? Rancher 1.x	3	982	January 4, 2018
Rancher upgrade service - download before restart Rancher 1.x	4	1101	October 21, 2016

Upgrade services that require quorums

Related topics