Bringing up rancher after an outage

Hi,

I have a debian8 host with rancher on it and then 3 debian8 docker containers as hosts. I had them all turned off and just turned them all back on and while No3 has come up fine saying “active” under hosts 1 and 2 say “disconnected”, how do I fix this please?

This has now happened again after a complete rebuild, so its a common problem?

Are you trying to use docker containers as rancher hosts? I do not think that is a supported configuration. I believe that the rancher hosts need to be either separate physical machines, or at least separate virtual machines.

The docs say I can and its exactly what I have followed. Yes its a dedicated VM, its running docker but nothing else but rancher.

Can you link to the documentation you are referring to?

“Any modern Linux distribution with a supported version of Docker”

http://rancher.com/docs/rancher/v1.6/en/installing-rancher/installing-server/#single-container-external-database

and,

http://rancher.com/docs/rancher/v1.6/en/quick-start-guide/

Rancher itself seems to work fine on the vm host, that isnt the issue the problems are communications to the 3 hosts that will do the work.

Meantime I’ve Switched from Debian8 to ubuntu 16 LTS and an external mysql database to see if that makes any difference. I will re-do the 3 hosts as ubuntu 16 LTS as well.

Those refer to using a single host/vm to handle both the Rancher server and a single rancher agent. I do not believe that you can run multiple rancher hosts as containers.

What I would suggest is making 3 or 4 VMs running the rancher services:

VM1: rancher/server (optionally also run rancher/agent on this VM, but not recommended)
VM2: rancher/agent
VM3: rancher/agent
VM4: rancher/agent

Each VM should be a separate ESX/Virtualbox/Qemu VM and not a Docker container.

This is exactly what I have done!

The problem is when I reboot VM2, 3 or 4 then rancher cannot re-connect to them.

Actually my new layout is,

VM1: rancher/server —> external Mariadb
VM2: rancher/agent
VM3: rancher/agent
VM4: rancher/agent

Ok, that was not clear by your initial post. I read your setup as: 1 rancher/server and 3 rancher/agent containers all running on a single VM.

Are all 3 of your agent hosts in the same network, or are they remote? Is there any NAT device or firewall between VM1 and VM2-4?

Yes same subnet, no NAT, no firewall except the one Docker puts up. It all works initially but doesnt survive a reboot.

On the host that does not work, are there any clues in the docker logs?

docker logs <id_of_rancher_agent>

I have re-built now so I’ll retest / check if this re-occurs.