Host in AWS keeps hanging at the "Installing Rancher agent" stage

Tommy_Walker · February 9, 2017, 7:24pm

Help! Total newbie here, trying to set up a proof of concept with Rancher and AWS. Launching a host in AWS keeps hanging at the “host.provision” stage. I can see the EC2 at AWS, I can see the security group and it seems to have the correct ports exposed. Any help??

AMI ID
rancheros-v0.7.1-hvm-1 (ami-dfdff3c8)

I’m in the same AZ as the rancher-server.

process showing a lot of:

ID Name Resource Exit Reason Start Time End Time Run Time
1pi264 agent.disconnect agent:1a2 DONE 2:18 PM 2:18 PM -
1pi263 agent.reconnect agent:1a2 CHAIN 2:14 PM 2:18 PM 230 sec
1pi262 agent.disconnect agent:1a2 DONE 2:14 PM 2:14 PM -
1pi261 agent.reconnect agent:1a2 CHAIN 2:10 PM 2:14 PM 232 sec

*** UPDATE:

I deactivated and deleted the rancherOs EC2 I was attempting to launch above.
Started again with the default Ubuntu image (the default provided in the rancher UI)

This host is also hanging at the same stage “Installing Rancher agent”

I ssh’d onto the ubuntu EC2 and I see the following:

ubuntu@r-host1:~$ sudo docker ps -a
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS                         PORTS               NAMES
b89d26503ae8        rancher/agent:v1.2.0   "/run.sh run"            About an hour ago   Up About an hour                                   rancher-agent
3658e27d3879        rancher/agent:v1.2.0   "/run.sh http://ranch"   About an hour ago   Exited (0) About an hour ago                       rancher-agent-bootstrap

When rancher creates the host, docker-machine should install docker and then run the rancher-agent container?

From the rancher/server container logs:

time="2017-02-09T23:04:34Z" level=info msg="pulling rancher/agent:v1.2.0 image." 
time="2017-02-09T23:04:44Z" level=info msg="Container created for machine" containerId=3658e27d38793f7eb8988b4474ec764978d71e352765b240b77e65e0a5505004 machineId=1ph3 resourceId=1ph3 
time="2017-02-09T23:04:44Z" level=info msg="Rancher-agent for machine started" containerId=3658e27d38793f7eb8988b4474ec764978d71e352765b240b77e65e0a5505004 machineExternalId=f93756e7-3237-430c-8fba-e234a2ac42b8 resourceId=1ph3 
time="2017-02-09T23:04:44Z" level=info msg="Creating and uploading extracted machine config" resourceId=1ph3 
time="2017-02-09T23:04:44Z" level=info msg="Machine config file created and encoded." destFile="/var/lib/cattle/machine/f93756e7-3237-430c-8fba-e234a2ac42b8/r-host1.tar.gz" resourceId=1ph3 
2017-02-09 23:07:24,755 ERROR [:] [] [] [] [TaskScheduler-2] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [3] count [3] 
2017-02-09 23:07:29,757 ERROR [:] [] [] [] [TaskScheduler-2] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [3] count [4] 
2017-02-09 23:07:34,759 ERROR [:] [] [] [] [TaskScheduler-2] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [3] count [5] 
2017-02-09 23:07:39,761 ERROR [:] [] [] [] [TaskScheduler-2] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [3] count [6] 
2017-02-09 23:07:39,762 ERROR [:] [] [] [] [TaskScheduler-2] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Scheduling reconnect for [3] 
2017-02-09 23:07:44,772 ERROR [:] [] [] [] [TaskScheduler-2] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [3] count [7] 
2017-02-09 23:07:49,773 ERROR [:] [] [] [] [TaskScheduler-2] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [3] count [8] 
2017-02-09 23:07:54,775 ERROR [:] [] [] [] [TaskScheduler-2] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [3] count [9]

I’m unsure what else to look at, to find the error. Could it be an error in my config of either rancher server or the host? Newbie, I have no idea ! If anyone could point me in any direction it would be greatly appreciated.

**** UPDATE 2

ok thx to a reply on a different post I realized I had to sudo my docker command on the newly created ubuntu host.
Now I do see the rancher-agent container running there.
And an error in the docker logs for that rancher-agent container.

time="2017-02-10T00:46:37Z" level=error msg="Error subscribing to events: websocket: bad handshake" subscribeUrl="ws://xxxxxxx:8080/v1/subscribe?eventNames=storage.volume.activate&eventNames=storage.volume.remove&eventNames=config.update&eventNames=compute.instance.activate&eventNames=compute.instance.deactivate&eventNames=compute.instance.force.stop&eventNames=storage.image.activate&eventNames=ping&eventNames=compute.instance.inspect&eventNames=compute.instance.pull&eventNames=compute.instance.remove" 
time="2017-02-10T00:46:37Z" level=error msg="Got error response" responseHeaders=map[Server:[Jetty(9.2.11.v20150529)] Set-Cookie:[PL=rancher;Path=/] X-Api-Account-Id:[1a10] X-Rancher-Version:[v1.4.0] Content-Length:[1316] Expires:[Thu, 01 Jan 1970 00:00:00 GMT] Date:[Fri, 10 Feb 2017 00:46:36 GMT] Vary:[Accept-Encoding, User-Agent] X-Api-Client-Ip:[xxxxxx] X-Api-Schemas:[http://xxxxxxx] X-Api-User-Id:[1a10] Connection:[keep-alive] Content-Type:[application/json; charset=utf-8]] status="200 OK" statusCode=200 
time="2017-02-10T00:46:37Z" level=error msg="Error response: {\"type\":\"collection\",\"resourceType\":\"subscribe\",\"links\":{\"self\":\"http:\\/\\/rancher-elb-classic-1803487995.us-east-1.elb.amazonaws.com:8080\\/v1\\/subscribe\"},\"createTypes\":{},\"actions\":{},\"data\":[],\"sortLinks\":{\"eventNames\":\"http:\\/\\/xxxxxxxx\\/v1\\/subscribe?eventNames=storage.volume.activate&eventNames=storage.volume.remove&eventNames=config.update&eventNames=compute.instance.activate&eventNames=compute.instance.deactivate&eventNames=compute.instance.force.stop&eventNames=storage.image.activate&eventNames=ping&eventNames=compute.instance.inspect&eventNames=compute.instance.pull&eventNames=compute.instance.remove&sort=eventNames\"},\"pagination\":null,\"sort\":null,\"filters\":{\"eventNames\":[{\"value\":\"storage.volume.activate\",\"modifier\":\"eq\"},{\"value\":\"storage.volume.remove\",\"modifier\":\"eq\"},{\"value\":\"config.update\",\"modifier\":\"eq\"},{\"value\":\"compute.instance.activate\",\"modifier\":\"eq\"},{\"value\":\"compute.instance.deactivate\",\"modifier\":\"eq\"},{\"value\":\"compute.instanc" 
time="2017-02-10T00:46:37Z" level=fatal msg="Exiting. Error: websocket: bad handshake"

ok getting there. Any input on what this error might indicate? We are running rancher server behind an ELB (as per the rec) and that ELB is not in the rancher-machine security group, maybe it needs to be?

thanks, Tommy

Topic		Replies	Views
Swarm on AWS hangs on "Installing Rancher agent" Rancher 1.x	2	924	January 27, 2017
Rancher OS EC2 machines get stuck after a while RancherOS	3	1413	July 2, 2016
Add Host Stuck at "Setting Up Machine..." Rancher 1.x	2	1002	January 23, 2017
Rancher launch problem	0	713	June 9, 2019
Setting up Swarm hangs on "Starting services" Rancher 1.x	7	2957	April 28, 2017

Host in AWS keeps hanging at the "Installing Rancher agent" stage

Related topics