Weirdness with upgrades

Ok i’m having major weirdness with rancher-compose up --upgrade.

I have a simple service, and I am using Rancher v 1.0.0. I’m trying to upgrade it.

Here is my docker-compose.yml:

testservice:
  ports:
  - 8080:8080/tcp
  environment:
    APP_CONFIG_URL: http://consul:8500/v1/kv/test-service-config?raw
    JVM_CONFIG_URL: http://consul:8500/v1/kv/test-service-container-config?raw
  external_links:
  - consul/consul:consul
  - TestSingleHostNameStack/redis:redis
  labels:
    io.rancher.scheduler.affinity: "container_label_soft_ne: io.rancher.stack_service.name=$${stack_name}/$${service_name}"
    io.rancher.container.pull_image: always
  tty: true
  hostname: testservice
  image: registry.colinxdev.com/test-service:DEV
  stdin_open: true

And rancher-compose.yml:
testservice:
scale: 2
health_check:
port: 8080
interval: 2000
unhealthy_threshold: 3
strategy: recreate
response_timeout: 2000
request_line: GET /test-service/api/heartbeat HTTP/1.0
healthy_threshold: 2
upgrade_strategy:
start_first: true

I start with a working set of two services, one on each of my nodes. Then, when i upgrade using this command:

rancher-compose up --force-upgrade --pull

A very odd sequence happens. First, new copies of the containers start. but then they stop. and then start. and then stop. Forever, until i go into the GUI and choose ‘cancel upgrade’.

Then, an even more bizarre thing happens. When i choose ‘finish upgrade’ after cancelling, THEN the new containers start successfully, and the old ones are shut down.

this doesnt make any sense to me at all, but i’ve been able to dupilcate the sequence of events several times.

You only have 2 hosts correct?

This is what your docker-compose/rancher-compose are telling Rancher to do:

Create a service of scale 2 that exposes port 8080

When you upgrade, you’ve also requested the upgrade to start containers before stopping containers

upgrade_strategy:
  start_first: true

Since you only have 2 hosts and each host has a service with exposed port 8080 being used, Rancher is not stopping your old services and is trying to launch a new service that also needs port 8080 available for it. But there is no port available.

In the UI, you should be able to drill down into the service and see the container in error state, that should provide you with the error of the fact that it’s unable to schedule due to the no host with port 8080 available.

oh! Right! i did try to find the error state, but i couldnt find the error. The upgrade was in a loop, i could never see the error of the continer before it was launching another one.

But this helps immensely! So i just need to retry but without a service definition, and with a load balancer instead!

Thanks so much for the help!