Can't get rancher agent running on the same host as the rancher server


We had Rancher agent v1.2.2 running happily on the same host as a 1.5.5 rancher server, however the box died. I tried to re-create this but can’t get the agent to start, I have tried upgrading to V1.6.7 server but still the same outcome:

Agent log outputs:

Found container ID: faf46007a63d1e03cfd5e80a7e740a69953c0ef82c163b7703a40d6bb6db2f39
Checking root: /host/run/runc
Checking file: 16bf6c71ab041008bd7ed718311c837f395e65624bd1cb2747f0fd248e59e812
Checking file: 244ba72ddcdb251df0e9aa139838af4f1054d3ebedf02c52785646ab32a0d12e
Checking file: 3673dfc942226a5532d551d31e8e9b14e669362b207d5769bb5edf5400c8f41b
Checking file: 6f8487b986a991241c986f4fdb3e0e3493f156e45a95cf045b63a6b90a90f020
Checking file: 8ae419add54bd951d6a03c5d14d21038f340b9d9b9fe1e04909974cb8ad1cf36
Checking file: 9a41172e02d946279da7194ada965f63c2e23e2033deec5cfcf0d9a1ba4b597b
Checking file: ab4f6da579aa280a1d708433e120fa48bd98b0bbc891400ddaceb00712e3e079
Checking file: b0337a84e14d100a41d4e3113ddf8047424b5bb85c339c425765f9b55794fe2d
Checking file: c26b0ec49fce9772e63469a877cd06032fc55f8679b1044e3274da9bca37ac68
Checking file: f1dcbde9fe7ac3103bde5cd27646c318ba8d314458429640854aedaf87132741
Checking file: faf46007a63d1e03cfd5e80a7e740a69953c0ef82c163b7703a40d6bb6db2f39
Found state.json: faf46007a63d1e03cfd5e80a7e740a69953c0ef82c163b7703a40d6bb6db2f39
time="2017-08-21T15:19:00Z" level=info msg="Execing [/usr/bin/nsenter --mount=/proc/30927/ns/mnt -F -- /data/docker_images/aufs/mnt/40b80260068afa27344220170481e7d9ed6c3af4eb99fa74bc28656b185d527f/usr/bin/share-mnt --stage2 /var/lib/rancher/volumes /var/lib/kubelet -- norun]"
INFO: Starting agent for F3AAD6F9208CF4D69049
INFO: Starting agent for F3AAD6F9208CF4D69049
INFO: Starting agent for F3AAD6F9208CF4D69049
INFO: Starting agent for F3AAD6F9208CF4D69049
INFO: Starting agent for F3AAD6F9208CF4D69049
INFO: Starting agent for F3AAD6F9208CF4D69049

And that ‘Starting agent’ line just repeats.

In the Rancher Server logs we get quite a few of these:

2017-08-21 15:26:35,371 ERROR [968c9f7d-4df1-4470-9517-05af1513349c:874191] [instance:32858->instanceHostMap:32012] [instance.start->(InstanceStart)->instancehostmap.activate] [] [cutorService-12] [c.p.e.p.i.DefaultProcessInstanceImpl] Agent error for [compute.instance.activate.reply;agent=78]: Timeout getting IP address

Not quite sure how to fix it though? We set the CATTLE_AGENT_IP variable on rancher agent startup…

Update - I have confirmed the agent container can connect to the rancher server, as it mentions a successful connection test in the logs, and I have also checked I can ping the specified IP from within containers running on the same host so its not a firewall issue etc. Anyone got any ideas?

try using -e CATTLE_AGENT_IP='<ip_addr_of_host>' with your docker run command for your agent host.

Hi there,

We have already tried that, though I mentioned it hidden in the bottom of the first post!

Sorry, missed that. Is the agent host being reused? Have you removed the ‘/var/lib/rancher’ folder?

Hi there,

The agent host is not being re-used, but I am using the old Rancher DB as that was in a separate container.

Wiping the /var/lib/rancher has not helped.

For anyone finding this issue, the cause was the /etc/hosts file on that machine was pointing to the wrong IP for the domain name the rancher server was advertising! Many thanks to @superseb for the help!

1 Like