Host provisioning issues

First I am truly impressed with the Rancher capabilities. Also, the host provisioning features are very interesting (albeit I understand many of those integrations are inherited by docker-machine).

I have done a couple of tests but both of them ended up with problems:

  1. I have pre-provisioned an ubuntu 14.04 VM with Docker 1.10.c3 (?). I chose the “Custom” way to add a host by running the container per the syntax in the “Custom” workflow. The host got registered after a minute or so but for some reasons both containers that are part of the management stack failed to come up and ended up being in an “Initializing” state forever). I have tried to remove the host from the Rancher UI, remove all containers on the host and re-run the container on the host (with the web token etc) but this time if I tailed the log of the agent it would just keep looping with this:

INFO: Sending host-config applied 1-bd6a262cb4e2f642ee6eecf15d6a6b6fa9b449d7bc5d6df6985f780435e8d926
time=“2016-05-13T14:56:12Z” level=“info” msg="Starting event router."
time=“2016-05-13T14:56:12Z” level=“info” msg=“Watching state directory: /var/lib/rancher/state/containers"
time=“2016-05-13T14:56:12Z” level=“info” msg=“Processing event: &docker.APIEvents{Status:“start”, ID:“04a1ecf3e7d8d31e0d50fcc40a09920dc1ed3c9da437871e9f3f02fa84c12f3f”, From:”-simulated-”, Time:0}"
2016-05-13 14:56:12,653 ERROR root [140255763532048] [_logging.py:54] error from callback <function on_error at 0x7f8fd6c1c1b8>: Received websocket error: [Invalid WebSocket Header]
time=“2016-05-13T14:56:12Z” level=“info” msg=“Host not registered yet. Sleeping 1 second and trying again.” Attempt=0 reportedUuid="0d36031c-b00d-4854-830b-b19a329d9d7f"
Monitoring mount context of PID 812
I0513 16:56:12.714528 71467 storagedriver.go:45] Caching stats in memory for 2m0s
I0513 16:56:12.714752 71467 manager.go:168] cAdvisor running in container: "/docker/04a1ecf3e7d8d31e0d50fcc40a09920dc1ed3c9da437871e9f3f02fa84c12f3f"
I0513 16:56:12.917503 71467 fs.go:109] Filesystem partitions: map[/dev/mapper/ubuntuonprem1–vg-root:{mountpoint:/ major:252 minor:0 fsType: blockSize:0} /dev/sda1:{mountpoint:/boot major:8 minor:1 fsType: blockSize:0}]
PID PGID S TTY TIME COMMAND
71435 71159 S ? 00:00:00 /bin/bash /var/lib/cattle/pyagent/cattle/process_watcher.sh
71440 71159 S ? 00:00:00 /bin/bash /var/lib/cattle/bin/cadvisor.sh cadvisor -logtostderr=true -listen_ip 127.0.0.1 -port 9344 -housekeeping_interval 1s -docker_root /var/lib/docker
71444 71159 S ? 00:00:00 host-api -cadvisor-url http://127.0.0.1:9344 -logtostderr=true -ip 0.0.0.0 -port 9345 -auth=true -host-uuid 0d36031c-b00d-4854-830b-b19a329d9d7f -public-key /var/lib/cattle/etc/cattle/api.crt -cattle-url http://192.168.1.50/v1 -cattle-state-dir /var/lib/rancher/state/containers
71456 71159 S ? 00:00:00 /bin/bash /var/lib/cattle/bin/cadvisor.sh cadvisor -logtostderr=true -listen_ip 127.0.0.1 -port 9344 -housekeeping_interval 1s -docker_root /var/lib/docker
71458 71159 S ? 00:00:00 sleep 1
71467 71159 S ? 00:00:00 /var/log/rancher/.cadvisor -logtostderr=true -listen_ip 127.0.0.1 -port 9344 -housekeeping_interval 1s -docker_root /var/lib/docker
71475 71159 R ? 00:00:00 ps -eO pgid
71476 71159 S ? 00:00:00 grep -E PGID| 71159
PID PGID S TTY TIME COMMAND
71435 71159 S ? 00:00:00 /bin/bash /var/lib/cattle/pyagent/cattle/process_watcher.sh
71440 71159 S ? 00:00:00 /bin/bash /var/lib/cattle/bin/cadvisor.sh cadvisor -logtostderr=true -listen_ip 127.0.0.1 -port 9344 -housekeeping_interval 1s -docker_root /var/lib/docker
71444 71159 S ? 00:00:00 host-api -cadvisor-url http://127.0.0.1:9344 -logtostderr=true -ip 0.0.0.0 -port 9345 -auth=true -host-uuid 0d36031c-b00d-4854-830b-b19a329d9d7f -public-key /var/lib/cattle/etc/cattle/api.crt -cattle-url http://192.168.1.50/v1 -cattle-state-dir /var/lib/rancher/state/containers
71456 71159 S ? 00:00:00 /bin/bash /var/lib/cattle/bin/cadvisor.sh cadvisor -logtostderr=true -listen_ip 127.0.0.1 -port 9344 -housekeeping_interval 1s -docker_root /var/lib/docker
71458 71159 S ? 00:00:00 sleep 1
71467 71159 S ? 00:00:00 /var/log/rancher/.cadvisor -logtostderr=true -listen_ip 127.0.0.1 -port 9344 -housekeeping_interval 1s -docker_root /var/lib/docker
71482 71159 R ? 00:00:00 ps -eO pgid
71483 71159 S ? 00:00:00 grep -E PGID| 71159
PID 71159 has died doing kill 71440 71444 71456 71458 71467
PID PGID S TTY TIME COMMAND
71435 71159 S ? 00:00:00 /bin/bash /var/lib/cattle/pyagent/cattle/process_watcher.sh
71485 71159 R ? 00:00:00 ps -eO pgid
71486 71159 S ? 00:00:00 grep -E PGID| 71159
Caught EXIT
PID PGID S TTY TIME COMMAND
71435 71159 S ? 00:00:00 /bin/bash /var/lib/cattle/pyagent/cattle/process_watcher.sh
71492 71159 R ? 00:00:00 ps -eO pgid
71493 71159 S ? 00:00:00 grep -E PGID| 71159
INFO: Starting agent for 5D5F1A450A5742045202
INFO: Access Key: 5D5F1A450A5742045202
INFO: Config URL: http://192.168.1.50/v1
INFO: Storage URL: http://192.168.1.50/v1
INFO: API URL: http://192.168.1.50/v1
INFO: IP: 192.168.1.44
INFO: Port:
INFO: Required Image: rancher/agent:v1.0.1
INFO: Current Image: rancher/agent:v1.0.1
INFO: Using image rancher/agent:v1.0.1
INFO: Downloading agent http://192.168.1.50/v1/configcontent/configscripts
INFO: Updating configscripts
INFO: Downloading http://192.168.1.50/v1//configcontent//configscripts current=
INFO: Running /var/lib/cattle/download/configscripts/configscripts-1-f0f3fb2e1110b5ada7c441705981f93a480313a324294321cff467f0c3e12319/apply.sh
INFO: Sending configscripts applied 1-f0f3fb2e1110b5ada7c441705981f93a480313a324294321cff467f0c3e12319
INFO: Updating pyagent
INFO: Downloading http://192.168.1.50/v1//configcontent//pyagent current=
INFO: Running /var/lib/cattle/download/pyagent/pyagent-1-9e3dcab12585149a65d818082279708965cd826df1937a55b24f48428f8f0c0673b9f37ae03c16234efdd22657d02b33/apply.sh
INFO: Updating host-api
INFO: Downloading http://192.168.1.50/v1//configcontent//host-api current=1-107028519ccabdfb854c7d38495ec28adf5970c97c05a37713eec3191c478a3f
INFO: Already up to date
INFO: Sending host-api applied 1-107028519ccabdfb854c7d38495ec28adf5970c97c05a37713eec3191c478a3f
INFO: Updating agent-binaries
INFO: Downloading http://192.168.1.50/v1//configcontent//agent-binaries current=1-4200f9b686342da8e941359211b1d8c5f98877ae293d0ead231d43507f1c9eae
INFO: Already up to date
INFO: Sending agent-binaries applied 1-4200f9b686342da8e941359211b1d8c5f98877ae293d0ead231d43507f1c9eae
INFO: Updating python-agent
INFO: Downloading http://192.168.1.50/v1//configcontent//python-agent current=1-fdc71941c51653d964df32b019d90f7983affdb2e52bff60c0b1da240983d7b9
INFO: Already up to date
INFO: Sending python-agent applied 1-fdc71941c51653d964df32b019d90f7983affdb2e52bff60c0b1da240983d7b9
INFO: Updating cadvisor
INFO: Downloading http://192.168.1.50/v1//configcontent//cadvisor current=1-21c3283c2bb12833be339f90a0c9a64d933464e475045636b8a43412ad630347
INFO: Already up to date
INFO: Sending cadvisor applied 1-21c3283c2bb12833be339f90a0c9a64d933464e475045636b8a43412ad630347
INFO: Sending pyagent applied 1-9e3dcab12585149a65d818082279708965cd826df1937a55b24f48428f8f0c0673b9f37ae03c16234efdd22657d02b33
INFO: Starting agent /var/lib/cattle/pyagent/apply.sh
INFO: Executing /var/lib/cattle/pyagent/main.py
INFO: Updating host-config
INFO: Downloading http://192.168.1.50/v1//configcontent//host-config current=host-config-1-bd6a262cb4e2f642ee6eecf15d6a6b6fa9b449d7bc5d6df6985f780435e8d926
INFO: Already up to date


  1. I have also tried to provision a host using the vSphere integration. The Boot2Docker VM would be provisioned just fine (as seen from within vSphere) but the Rancher UI would stay there forever in the “waiting for agent to connect”. This host never came up.

Any taker?

Thanks. Massimo.

Your first issue we can troubleshoot. One thing to note though is that if you add a host to Rancher then delete that host and add it back it often doesn’t work because we save the “identity” of the host in /var/lib/rancher/state and we see that that host has been deleted. In that situation just delete /var/lib/rancher/state on the host. If you can share the full log, that would be better. I could figure out what is going on from the snippet you provided.

The vSphere issue may be related to the “Host Registration URL.” In Admin->Settings in the UI make sure the “Host Registration URL” is correct and accessible from the vSphere VMs. That is the URL they will use to connect to Rancher.

Thanks @ibuildthecloud (Darren).

I have tried to delete the /var/lib/rancher/state on the offending host but the script wouldn’t work either. I have also tested the same procedure with a brand new Docker Host (pre-provisioned clean Ubuntu VM) but it would fail with the same error/loop in the container agent log. All components (HA docker hosts, LB, DB as well as the new hosts being added) are on the same flat L2 network and the Ubuntu firewall has been disabled.

I have tried to CURL the script like this:

curl http://192.168.1.50/v1/scripts/434BF05D0E28884F35D6:1463468400000:plGzCRpWM9MLIJ3aS8syLZ9Qbs

And it works just fine (.50 is the IP of the load balancer that sprays against the 3 hosts in HA).

Thanks.

Would you be able to provide the full logs for your host as @ibuildthecloud requested?

What did the curl return?