Rancher HA Management Stack not coming up

I am trying out Rancher v1.0.1 HA Setup on AWS
OS : RancherOS

Using RDS as the external Mysql instance and I’m running the script on 3 nodes

One instance of both go-machine-service & rancher-compose-executor are not starting and thus my management stack.

Any help would be much appreciated

Could you get the logs for those containers that won’t start and post them here.

They never reach running status. Initializing > Reconciling > Stopping so not sure how do I find the logs for failing instance

Last log in docker logs -f rancher-ha is the following on all three nodes

time=“2016-04-15T09:50:02Z” level=info msg=“Container agent is not running in state &types.ContainerState{Status:“exited”, Running:false, Paused:false, Restarting:false, OOMKilled:false, Dead:false, Pid:0, ExitCode:0, Error:”", StartedAt:“2016-04-15T09:46:52.499514335Z”, FinishedAt:“2016-04-15T09:47:02.105292501Z”}" component=docker
time=“2016-04-15T09:50:02Z” level=info msg=“Deleting container 89f5a53bd7ce3b263891f9d303fadd052b29edd048c9fe6af269151035893b06” component=docker

/var/log/docker.log is spamming

time=“2016-04-15T13:53:46.124242130Z” level=error msg=“Handler for GET /v1.22/containers/rancher-ha-agent/json returned error: No such container: rancher-ha-agent”

I’m guessing here…

Have a look at the logs coming out of the cattle container, it might not log to STDOUT though. Think it’s logs are in /var/lib/cattle/logs

So that issue is fixed now, there were some loadbalancer issues. But I’ve hit another snag

So management stack is working great now but there’s a networking problem in the hosts that I’m adding. Containers are not able to communicate among each other, basically I cannot ping a container from any other worker node. Is there a different method for adding worker hosts to the HA stack?

Network agent has following in STDERR

RTNETLINK answers: No such file or directory
SIOCSARP: Invalid argument
arp: cannot set entry on line 2 of etherfile content-home/etc/cattle/ethers !

Seeing similar actually:

4/19/2016 10:29:46 PMSIOCSARP: Invalid argument
4/19/2016 10:29:46 PMarp: cannot set entry on line 2 of etherfile content-home/etc/cattle/ethers !

Should probably raise as a bug on github.

Also one more weird thing, Host ips on the UI are not correct. Nodes on which HA is running they have correct IPs but any host that I add get an incorrect one.

Are they all the same IP? just noticed that myself! I think it’s breaking the host>host communication as it’s establishing the tunnel to the wrong IP

They are not the same in my case

Just validated, you don’t get this issue when not using HA.

Yeap, same here. HA issue it seems

So it seems it is taking ELB IPs ( private ip )

ELB Private ips can be found by

aws ec2 describe-network-interfaces --filters “Name=description,Values=ELB rancher” |grep -wE ‘Description|PrivateIpAddress’

I used -e CATTLE_AGENT_IP=’’ to explicitly state IP and it seems to work.

Good find. So Rancher is possibly doing a reverse DNS lookup on the Callback URL and that value is somehow being translated into the host IP address. Seems strange as it works on a single server without ELB.

So, enabling the Proxy protocol on the ELB i’m led to believe might fix this issue. I’ve not had success personally but it’s worth a shot if you get chance.

I’m having the same problem … and some container just does not start up and can not get “network agent” via shell to make a ping between them … and a bug you? Do you have a fix?

regards,

Check the logs for exited containers. I had two main issues, https certificates not matching URL and elb listerners being http https, it should be tcp and ssl.

@shethchintan7 Are you still having issues? We have started to recommend specifying the -e CATTLE_AGENT_IP when using an AWS setup with ELB.

@denise What are you recommending setting the CATTLE_AGENT_IP to?

@denise No, it’s working great. Thanks

@ThatsNinja

when adding the host, you set it it to the IP of the host. These docs explain what you are doing when setting the CATTLE_AGENT_IP but just a different known use case of when it needs to be set.

http://docs.rancher.com/rancher/rancher-ui/infrastructure/hosts/custom/#adding-hosts-to-the-same-machine-as-rancher-server