Rancher High Availability

Hi,

I’m having real troubles getting Rancher HA running ( 3 Hosts )
I have

  1. All ports open both ingress and egress ( 0.0.0.0/0 ) on all hosts ( hosts are in the SAME AWS region but separate AZs )
  2. External database which HAS connections established
  3. 16GB RAM on the hosts
  4. 50GB volumes on the hosts
  5. I’ve followed the specific documentation around setting up HA - have run the generated script against the latest container and keep seeing errors :

time=“2016-11-08T11:52:12Z” level=info msg=“Waiting for 3 host(s) to be active” component=cert
time=“2016-11-08T11:52:14Z” level=info msg=“Waiting for 3 host(s) to be active” component=cert
time=“2016-11-08T11:52:17Z” level=fatal msg=“Failed while waiting for 3 host(s) to be active: Get http://172.17.0.1:18080/v1/projects/1a7/hosts?state=active: read tcp 172.17.0.1:52114->172.17.0.1:18080: read: connection reset by peer” component=service

Can someone tell me please what rancher is doing on boot up

I assume there is cross container networking at the host level but what containers are trying to communicate with each other - is there a list ?

Also is there any cross host communication going on - if so what ?

The documentation is generally pretty good, however when you get down to this level I can’t find anything to help me out so any guidance would be much appreciated. This has been burning me for the best part of a week now, and I really don’t want to use another Orchestrator :slight_smile:

Thanks
Paul

1 Like

It ended up having nothing to do with security groups. Running the below before starting the services fixed it

rm -rf /var/lib/rancher/state/.physical_host_uuid
rm -rf /var/lib/rancher/state/.docker_uuid
rm -rf /var/lib/rancher/state/containers/.healthcheck
rm -rf /var/lib/rancher/cache/*

1 Like

I want to help you with this. Do you have any detailed documentation/tutorial to share?

My problem is: https://github.com/rancher/rancher/issues/6329
Error log:
time=“2016-10-19T08:21:50Z” level=info msg=“Can not launch agent right now: Server not available at http://192.168.169.2:18080/ping:” component=service

I made the attempt with a single host and the problem is the same.

@phooper0001 take a look at the ports rancher ha nodes use to communicate here: http://docs.rancher.com/rancher/v1.2/en/installing-rancher/installing-server/multi-nodes/#requirements