They never reach running status. Initializing > Reconciling > Stopping so not sure how do I find the logs for failing instance
Last log in docker logs -f rancher-ha is the following on all three nodes
time=“2016-04-15T09:50:02Z” level=info msg=“Container agent is not running in state &types.ContainerState{Status:“exited”, Running:false, Paused:false, Restarting:false, OOMKilled:false, Dead:false, Pid:0, ExitCode:0, Error:”", StartedAt:“2016-04-15T09:46:52.499514335Z”, FinishedAt:“2016-04-15T09:47:02.105292501Z”}" component=docker
time=“2016-04-15T09:50:02Z” level=info msg=“Deleting container 89f5a53bd7ce3b263891f9d303fadd052b29edd048c9fe6af269151035893b06” component=docker
/var/log/docker.log is spamming
time=“2016-04-15T13:53:46.124242130Z” level=error msg=“Handler for GET /v1.22/containers/rancher-ha-agent/json returned error: No such container: rancher-ha-agent”
So that issue is fixed now, there were some loadbalancer issues. But I’ve hit another snag
So management stack is working great now but there’s a networking problem in the hosts that I’m adding. Containers are not able to communicate among each other, basically I cannot ping a container from any other worker node. Is there a different method for adding worker hosts to the HA stack?
Network agent has following in STDERR
RTNETLINK answers: No such file or directory
SIOCSARP: Invalid argument
arp: cannot set entry on line 2 of etherfile content-home/etc/cattle/ethers !
Also one more weird thing, Host ips on the UI are not correct. Nodes on which HA is running they have correct IPs but any host that I add get an incorrect one.
Good find. So Rancher is possibly doing a reverse DNS lookup on the Callback URL and that value is somehow being translated into the host IP address. Seems strange as it works on a single server without ELB.
So, enabling the Proxy protocol on the ELB i’m led to believe might fix this issue. I’ve not had success personally but it’s worth a shot if you get chance.
I’m having the same problem … and some container just does not start up and can not get “network agent” via shell to make a ping between them … and a bug you? Do you have a fix?
Check the logs for exited containers. I had two main issues, https certificates not matching URL and elb listerners being http https, it should be tcp and ssl.
when adding the host, you set it it to the IP of the host. These docs explain what you are doing when setting the CATTLE_AGENT_IP but just a different known use case of when it needs to be set.