I just wanted to create a test cluster before re-trying the migration to Rancher 1.2. The test cluster consists of the following:
- a load balancer which balances to a single rancher master (that’s how we route our services, it’s not an HA deployment)
- a Rancher master v1.2.1 with external database
- 3 Rancher nodes running Ubuntu 16.04 LTS
- everything on AWS, no network restrictions applied
Unfortunately the health checks (all except one) dont come up, as well as the scheduler (see images below).
The database was fresh, the hosts as well, nothing on them except ubuntu and Docker 1.11.2 from the docker repos. Docker itself is running fine (otherwise rancher wouldn’t come up, or any container), just rancher is not cooperating.
“everything on AWS” and “no network restrictions applied” do not normally go together, do you specifically have allow any from any to any in a group applied to the hosts? The hosts need to be able to reach each other using their registered public IPs (as shown in the hosts screen) on the IPSec ports (udp 500 and 4500).
What you’re showing is typical of when this is not working… The 1st healthcheck container was created at a time when only host A was active, so the only thing that can check it is itself, which doesn’t require cross-host communication. Then another host or two come in and now healthchecks are spread on to up to 3 hosts other than the one running the container, so the one on A is trying to check the one on B, but can’t, so it stays initializing.
@flypenguin We just debugged one setup this morning where the IP addresses of all the hosts were showing the same. Can you please check if that’s not the case on your setup?
@vincent - nah, that’s not it, cause I know my security groups . pretty much 100% sure the hosts can communicate freely.
@leodotcloud - YES, that’s actually it! the host IP that’s showing up (10.10.10.10) is not even used on any of the involved hosts. that was an issue a LONG time ago, is that still not solved?! that host is the load balancer in front of the rancher master server, though.
I then deactivated then deleted the hosts from rancher, but then I couldn’t get them up again using an explicit IP address (“error getting IP” or so). Then I stopped docker, did
rm -rf /var/lib/docker /var/lib/rancher, started docker, tried again - that worked fine. Let’s see how it continues
ok, that was the 2nd half of what I said though
The hosts need to be able to reach each other using their registered public IPs