I know there is doc about how to install Rancher server in HA mode and another doc about how to upgrade Rancher server but looks like neither of them mentions about how to upgrade Rancher server running in HA mode.
I set up my Rancher server (v1.1.0-dev4) in HA (3-nodes) mode and it has been running pretty well for some time and now I’d like to upgrade it to the latest version v1.1.0-dev5. Is it possible to upgrade one node at a time to achieve online upgrade? Will there any issue for nodes of different version talk to each other? Or I must stop all nodes and upgrade them which means I will need to stop the service for like 30~60 minutes?
I have no idea what the official way would be, but I have been able to just re-run the setup script with the new version (it tells you to run it locally):
Here’s the output of the logs that such an upgrade does:
time="2016-06-20T21:02:19Z" level=info msg="Cluster changed, index=0, members=[10.20.0.210]" component=service
time="2016-06-20T21:02:19Z" level=info msg="Container parent image is different rancher/server:v1.1.0-dev5 != rancher/server:v1.1.0-dev4" component=docker
time="2016-06-20T21:02:19Z" level=info msg="Deleting container f2f743b08f010e0d8fd0813b7aa3d846b57c6bb9067a48437d76ea792c35f97b" component=docker
time="2016-06-20T21:02:19Z" level=info msg="Deleting container e63e34d213cc120940c164d578c63460add137c196caa473e5d4fc22d215b56d" component=docker
time="2016-06-20T21:02:19Z" level=info msg="Deleting container 5a9b7ec4f4786656ae1be8f8e4fb4c348e8bb1e2745c94077a150052e7ba9019" component=docker
time="2016-06-20T21:02:19Z" level=info msg="Deleting container f37a5ebe7ee7f16f63f52d58e4535d0a7ecefd8314ff9b7a8bb0f7677fe0d85a" component=docker
time="2016-06-20T21:02:20Z" level=info msg="Deleting container 6aa3603fc6895e4be532ab7a7785eb6b8d139fe8a1c69a74d195fa2b43547460" component=docker
time="2016-06-20T21:02:20Z" level=info msg="Deleting container dc8aaeb57e08355c7f588595da00ee27f4c8550ff23b736b6c76b74da92ac967" component=docker
time="2016-06-20T21:02:20Z" level=info msg="Deleting container 6f18d2607ebd6a6960952181915fe030915cdb417b9e24250c92de88583b571d" component=docker
time="2016-06-20T21:02:20Z" level=info msg="Deleting container c67e675224e4ed617ca498e44d764f3cc6bc854e356502456265e97b5009bb29" component=docker
time="2016-06-20T21:02:20Z" level=info msg="Creating container rancher-ha-parent" component=docker
time="2016-06-20T21:02:21Z" level=info msg="Creating container rancher-ha-tunnel-redis-1" component=docker
time="2016-06-20T21:02:21Z" level=info msg="Creating container rancher-ha-tunnel-zk-quorum-1" component=docker
time="2016-06-20T21:02:21Z" level=info msg="Creating container rancher-ha-tunnel-zk-leader-1" component=docker
time="2016-06-20T21:02:22Z" level=info msg="Creating container rancher-ha-tunnel-zk-client-1" component=docker
time="2016-06-20T21:02:22Z" level=info msg="Creating container rancher-ha-cattle" component=docker
time="2016-06-20T21:02:22Z" level=info msg="Waiting for server to be available" component=cert
time="2016-06-20T21:02:22Z" level=info msg="Can not launch agent right now: Server not available at http://172.17.0.1:18080/ping:" component=service
time="2016-06-20T21:02:27Z" level=info msg="Waiting for server to be available" component=cert
time="2016-06-20T21:02:27Z" level=info msg="Can not launch agent right now: Server not available at http://172.17.0.1:18080/ping:" component=service
time="2016-06-20T21:02:32Z" level=info msg="Forgetting cluster member" component=manager member={23 6a13e17d-c062-4f5e-a90f-5a570b4e1a58 10.20.0.210 map[] 0 189106 1}
time="2016-06-20T21:02:32Z" level=info msg="Currently Master: true" component=manager master=true
time="2016-06-20T21:02:32Z" level=info msg="Assigning e214ea97-40cf-4ce0-8e46-2ed6d60c5e52 10.20.0.210 to index 1" component=manager
time="2016-06-20T21:02:37Z" level=info msg="Cluster changed, index=1, members=[10.20.0.210]" component=service
time="2016-06-20T21:02:37Z" level=info msg="Creating container rancher-ha-zk" component=docker
time="2016-06-20T21:02:38Z" level=info msg="Creating container rancher-ha-redis" component=docker
time="2016-06-20T21:02:38Z" level=info msg="Waiting for server to be available" component=cert
time="2016-06-20T21:02:38Z" level=info msg="Can not launch agent right now: Server not available at http://172.17.0.1:18080/ping:" component=service
time="2016-06-20T21:02:43Z" level=info msg="Waiting for server to be available" component=cert
time="2016-06-20T21:02:43Z" level=info msg="Can not launch agent right now: Server not available at http://172.17.0.1:18080/ping:" component=service
time="2016-06-20T21:02:48Z" level=info msg="Waiting for server to be available" component=cert
time="2016-06-20T21:02:48Z" level=info msg="Can not launch agent right now: Server not available at http://172.17.0.1:18080/ping:" component=service
time="2016-06-20T21:02:53Z" level=info msg="Waiting for server to be available" component=cert
time="2016-06-20T21:02:53Z" level=info msg="Can not launch agent right now: Server not available at http://172.17.0.1:18080/ping:" component=service
time="2016-06-20T21:02:58Z" level=info msg="Waiting for server to be available" component=cert
time="2016-06-20T21:02:58Z" level=info msg="Can not launch agent right now: Server not available at http://172.17.0.1:18080/ping:" component=service
time="2016-06-20T21:03:05Z" level=info msg="[0/10] [zookeeper]: Starting "
time="2016-06-20T21:03:06Z" level=info msg="[1/10] [zookeeper]: Started "
time="2016-06-20T21:03:06Z" level=info msg="[1/10] [tunnel]: Starting "
time="2016-06-20T21:03:06Z" level=info msg="[2/10] [tunnel]: Started "
time="2016-06-20T21:03:06Z" level=info msg="[2/10] [redis]: Starting "
time="2016-06-20T21:03:07Z" level=info msg="[3/10] [redis]: Started "
time="2016-06-20T21:03:07Z" level=info msg="[3/10] [cattle]: Starting "
time="2016-06-20T21:03:07Z" level=info msg="[4/10] [cattle]: Started "
time="2016-06-20T21:03:07Z" level=info msg="[4/10] [websocket-proxy]: Starting "
time="2016-06-20T21:03:07Z" level=info msg="[4/10] [websocket-proxy-ssl]: Starting "
time="2016-06-20T21:03:07Z" level=info msg="[4/10] [go-machine-service]: Starting "
time="2016-06-20T21:03:07Z" level=info msg="[4/10] [rancher-compose-executor]: Starting "
time="2016-06-20T21:03:08Z" level=info msg="Upgrading go-machine-service"
time="2016-06-20T21:03:08Z" level=info msg="Upgrading websocket-proxy"
time="2016-06-20T21:03:08Z" level=info msg="Upgrading rancher-compose-executor"
time="2016-06-20T21:03:08Z" level=info msg="Upgrading websocket-proxy-ssl"
time="2016-06-20T21:04:07Z" level=info msg="[5/10] [websocket-proxy-ssl]: Started "
time="2016-06-20T21:04:07Z" level=info msg="[5/10] [load-balancer-swarm]: Starting "
time="2016-06-20T21:04:08Z" level=info msg="[6/10] [load-balancer-swarm]: Started "
time="2016-06-20T21:04:09Z" level=info msg="[7/10] [rancher-compose-executor]: Started "
time="2016-06-20T21:04:09Z" level=info msg="[8/10] [websocket-proxy]: Started "
time="2016-06-20T21:04:09Z" level=info msg="[8/10] [load-balancer]: Starting "
time="2016-06-20T21:04:09Z" level=info msg="[9/10] [go-machine-service]: Started "
time="2016-06-20T21:04:10Z" level=info msg="[10/10] [load-balancer]: Started "
time="2016-06-20T21:04:10Z" level=info msg="Done launching management stack" component=service
Since all the services (such as proxies, apis etc) are their own containers, taking a look at the logs, its does an upgrade for that service.
Now, we don’t personally have a 3-node cluster, but, if I were to assume a few things, you can run that on one node, have it upgrade all services, then run it on other nodes to ensure you get latest UI. My guess your HA will be degraded during the upgrade, though.
Thanks for your reply. It is exactly the same as I thought. Actually I tried it (ran rancher-ha-start.sh node by node) but it didn’t work, perhaps it was because I was not patient enough. At last I stopped all containers on all three nodes and then ran the HA start script again on all three nodes, even thought it took a long time before the UI became accessible.