Zero downtime app upgrades

Let this be the central place for this topic.

From what I’ve observed on IRC and github etc. there will be some features coming out soon to help with this process.

For now I’ve been advised that using a service alias is the best practice but I’m having trouble understanding how to implement this (http://docs.rancher.com/rancher/rancher-ui/applications/stacks/adding-service-alias/).

docker-compose.yml:

countdown-lb:
  ports:
  - 8000:80/tcp
  tty: true
  image: rancher/load-balancer-service
  links:
  - countdown-v2:countdown-v2
  stdin_open: true
countdown-v1:
  ports:
  - 80/tcp
  restart: 'no'
  tty: true
  image: flaccid/countdown_example:v1
  stdin_open: true
countdown-v2:
  ports:
  - 80/tcp
  restart: 'no'
  tty: true
  image: flaccid/countdown_example:v2
  stdin_open: true
countdown-demo-alias:
  image: rancher/dns-service
  links:
  - 'countdown-v2:'

rancher-compose.yml:

countdown-lb:
  scale: 1
  load_balancer_config:
    name: countdown-lb config
countdown-v1:
  scale: 1
countdown-v2:
  scale: 1

In this case the service alias isn’t used because I’m not sure how to use it. If I point the LB to the alias instead of one of the container services it doesn’t work (connection refused), but I’m not sure if that is actually what I should be trying.

Currently I change the link on the LB to the new container server e.g.countdown-v1:countdown-v1 to countdown-v2:countdown-v2 then start rancher-compose again, It switches and we have the new version of the app. Works ok although I did notice sometimes a slight gap in service.

This is of course not the ideal method and I’d like to an example where I can use a service alias. Very open to learning what the best practice is and hopefully we can update documentation for users to get to that easy. The use case here is that the CI tool being used for continuous deployment can run rancher-compose or similar to deploy the new version of the application with zero downtime.

@flaccid

There are 2 ways to handle services upgrades in Rancher - via service Alias and via Load Balancer. I’ll explain the difference between two with their limitations, so you can chose which one suits your application needs better.

Service Alias:

Service Alias allows to link multiple services to the same DNS name. Lets say your application service App needs to be linked to the DB service, and you know that your DB service would require upgrade. For that, you create Service Alias named DB, and link your actual DB service - DBv1 - to it: App->DB (alias)->DBv1. When you need to switch to DBv1, simply add a link from DB to DBv1: App->DB (alias)->DBv1/DBv2. Once DBv2 is tested, remove the link to DBv1: App->DB (alias)->DBv2

Service Alias limitations:

  • service Alias can be resolved only in Rancher private network as there is no publicly faced DNS
  • Traffic coming from App to DB is not being balanced between service’s instances

Load Balancing:

Similar to service Alias, Load Balancer lets you connect multiple services to the same DNS name (LB service’s name). Upgrade process is alwasy very similar to the one above: 1) App->LB->DBv1 2) App->LB->DBv1/DBv2 3) App->LB->DBv2

There are extras this approach gives you:

  • traffic is being balanced between all instances of DB service.
  • Load Balancer is publicly faced service as its port are bound to the Rancher host public ports.

The latter is also a current limitation - Load Balancer service can’t be internal only. We are about to put a new feature allowing to chose private vs public Load Balancer service.

Here is how to do it via rancher-compose using the example you’ve provided + service Alias upgrade way

  1. countdown-demo-alias is the alias service and linked to countdown-v1. Do rancher-compose up to bring up the initial application:

countdown-demo-alias:
image: rancher/dns-service
links:

  • countdown-v1

countdown-v1:
ports:

  • 80/tcp
    restart: ‘no’
    tty: true
    image: flaccid/countdown_example:v1
    stdin_open: true
    countdown-v2:
    ports:
  • 80/tcp
    restart: ‘no’
    tty: true
    image: flaccid/countdown_example:v2
    stdin_open: true
  1. Add countdown-v2 to the list of service alias links. Don’t remove countdown-v1 yet - that will ensure 0 downtime upgrade. Perform rancher-compose up on the upgraded template:

countdown-demo-alias:
image: rancher/dns-service
links:

  • countdown-v1
  • countdown-v2
  1. Now remove the link to countdown-v1, and execute rancher-compose up again.

countdown-demo-alias:
image: rancher/dns-service
links:

  • countdown-v2

After that there will be only countdown-v2 behind the service alias.

Upgrade using Load Balancer is very similar, just instead countdown-demo-alias use countdown-lb:

countdown-lb:
image: rancher/load-balancer-service
links:

  • countdown-v1

countdown-lb:
image: rancher/load-balancer-service
links:

  • countdown-v1
  • countdown-v2

countdown-lb:
image: rancher/load-balancer-service
links:

  • countdown-v2
4 Likes

Appreciate your prompt and detailed reply @alena

I’m understanding those 2 processes better now and have tried them out. I just have a couple of remaining questions to close the gap.

  1. In the service alias method how is access to the application handled on the host/public level without an LB? It works well with a load balancer (8000->80) but I’m not sure how it would work with only a service alias and no LB, additionally It seems like you can’t point an LB to the service alias. I’m happy to use no LB but having difficulty understanding how you would use the service alias as an end point accessed by the user (with the A/B swapout happening behind it).

  2. Is it possible to have rancher-compose exit when the up is complete? I’ll need it to exit when placed into scripting/CI/CD.

Many thanks in advance. Nearly there!

1 Like

Added a github issue for needing rancher-compose to exit (https://github.com/rancher/rancher-compose/issues/45).

Update: worked that one out :slight_smile:

@flaccid

  1. Currently Rancher has 2 ways of publishing your application to public:
  • by creating a Load Balancer service when LB public port gets published to the host
  • by creating a regular service with public ports exposed and published on the host

In both cases application would get accessed by host’s public IP. So there is no solution for Alias service in place yet, and it can be accessed by its Alias’s DNS name only internally. Integration with Public DNS (like Route53) is something we are planning to work on in the near future.

  1. rancher-compose up -d will exit when all services are active. Thank you for filing the ticket, we are going to update the doc with this info.

+1 For app/service upgrades.
Having the upgrade or redeploy be ‘triggered’ by api would be the minimum needed to use in production. Tutum for example solves this with a webhook which is generated in the UI to be pasted into Dockerhub’s webhooks manager.

Also posted here: https://github.com/rancher/rancher/issues/1587#issuecomment-122958886

1 Like

Quick and dirty example of doing the container service swapout with LB: https://gist.github.com/flaccid/97473d0d85e2d6f4e292.

Rancher v0.3.0 with rancher-compose v0.25.0 has the new upgrade feature. Can we please establish some documentation including an example? (https://github.com/rancher/rancher/issues/1079#issuecomment-125397506).

I’ve tried it out but I’m confused on “Please pass arguments in the form: [from service] [to service]”.

The documentation is being worked on and should be out in the next day or two. But in general the syntax is rancher-compose upgrade service1 service2 <options>. In the docker-compose.yml, you’ll need to place the name of service1 (that would already be launched in Rancher) and service2 (with all the normal docker-compose goodies to define the service). Hope this brief bit can get you started!

I’ve updated more thorough documentation here: http://docs.rancher.com/rancher/rancher-compose/upgrading/

1 Like

Hi,

I was reading this procedure about update service with out downtime but I see that it’s necesary service re-linking that still will shutdown the service for several seconds or more. So there is a way to really update any service with out downtime?

The link is dead. Is now: https://docs.rancher.com/rancher/v1.6/en/cattle/upgrading/