Rancher POC on single server: unable to add host

Hi,

We are having a Proof Of Concept running a docker Swarm (or Kubernetes). I got interested in Rancher to manage the environment, however, I can’t seem to get the basics working. Very probably something I am missing or doing wrong but I can’t seem to figure it out…

this is what I did:

We are using Oracle Linux, which is just another Centos7 - Rhat7 clone. Docker was installed from the distro repo’s which resulted in a working 1.12.6. After that I lauched the racher server container, so far so good.
(docker run -d --restart=unless-stopped -p 8080:8080 rancher/server:stable)

My servers ‘public’ ip is 10.0.0.1 so I launched an add host (agent install) this way:
(docker run -e CATTLE_AGENT_IP=“10.0.0.1” --rm --privileged -v /var/run/docker.sock:/var/run/docker.sock -v /var/lib/rancher:/var/lib/rancher rancher/agent:v1.2.2 http://10.0.0.1:8080/v1/scripts/some-key-generated-by-rancher

Startup of the agent is ok, however, it does not seem to connect. In the docker logs of the agent I see:

time=“2017-06-16T07:56:47Z” level=info msg="Starting websocket pings"
time=“2017-06-16T07:56:47Z” level=error msg="invalid character ‘i’ in literal false (expecting ‘a’)"
time=“2017-06-16T07:56:52Z” level=error msg="invalid character ‘c’ looking for beginning of value"
time=“2017-06-16T07:56:57Z” level=error msg="invalid character ‘i’ in literal false (expecting ‘a’)"
time=“2017-06-16T07:57:02Z” level=error msg="invalid character ‘i’ in literal false (expecting ‘a’)"
time=“2017-06-16T07:57:07Z” level=error msg="invalid character ‘i’ in literal false (expecting ‘a’)"
time=“2017-06-16T07:57:12Z” level=error msg="invalid character ‘c’ looking for beginning of value"
time=“2017-06-16T07:57:17Z” level=error msg="invalid character ‘i’ in literal false (expecting ‘a’)"
time=“2017-06-16T07:57:22Z” level=error msg="invalid character ‘i’ in literal false (expecting ‘a’)"
time=“2017-06-16T07:57:27Z” level=error msg="invalid character ‘i’ in literal false (expecting ‘a’)"
time=“2017-06-16T07:57:32Z” level=error msg="invalid character ‘b’ looking for beginning of value"
time=“2017-06-16T07:57:37Z” level=error msg="invalid character ‘b’ looking for beginning of value"
time=“2017-06-16T07:57:42Z” level=error msg="invalid character ‘i’ in literal false (expecting ‘a’)"
time=“2017-06-16T07:57:48Z” level=error msg="invalid character ‘i’ in literal false (expecting ‘a’)"
time=“2017-06-16T07:57:53Z” level=error msg="invalid character ‘i’ in literal false (expecting ‘a’)"
time=“2017-06-16T07:57:58Z” level=error msg="invalid character ‘c’ looking for beginning of value"
time=“2017-06-16T07:58:03Z” level=error msg="invalid character ‘i’ in literal false (expecting ‘a’)"
time=“2017-06-16T07:58:08Z” level=error msg="invalid character ‘b’ looking for beginning of value"
time=“2017-06-16T07:58:13Z” level=error msg="invalid character ‘c’ looking for beginning of value"
time=“2017-06-16T07:58:18Z” level=error msg="invalid character ‘b’ looking for beginning of value"
time=“2017-06-16T07:58:23Z” level=error msg="invalid character ‘c’ looking for beginning of value"
time=“2017-06-16T07:58:28Z” level=error msg="invalid character ‘c’ looking for beginning of value"
time=“2017-06-16T07:58:33Z” level=error msg="invalid character ‘i’ in literal false (expecting ‘a’)"
time=“2017-06-16T07:58:38Z” level=error msg="invalid character ‘i’ in literal false (expecting ‘a’)"
time=“2017-06-16T07:58:43Z” level=error msg="invalid character ‘c’ looking for beginning of value"
time=“2017-06-16T07:58:48Z” level=error msg="invalid character ‘c’ looking for beginning of value"
time=“2017-06-16T07:58:53Z” level=error msg="invalid character ‘i’ in literal false (expecting ‘a’)"
time=“2017-06-16T07:58:58Z” level=error msg=“invalid character ‘c’ looking for beginning of value”

And in the server logs:

2017-06-16 07:57:16,298 ERROR [:] [] [] [] [TaskScheduler-1] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [3] count [5]
2017-06-16 07:57:21,301 ERROR [:] [] [] [] [TaskScheduler-1] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [3] count [6]
2017-06-16 07:57:21,302 ERROR [:] [] [] [] [TaskScheduler-1] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Scheduling reconnect for [3]
2017-06-16 07:57:26,313 ERROR [:] [] [] [] [TaskScheduler-1] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [3] count [7]
2017-06-16 07:57:31,315 ERROR [:] [] [] [] [TaskScheduler-1] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [3] count [8]
2017-06-16 07:57:36,317 ERROR [:] [] [] [] [TaskScheduler-1] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [3] count [9]
time=“2017-06-16T08:01:15Z” level=error msg="Failed to update existing repo cache: exit status 128"
time=“2017-06-16T08:01:15Z” level=error msg="Failed to update existing repo cache: exit status 128"
time=“2017-06-16T08:01:26Z” level=error msg=“Failed to update existing repo cache: exit status 128”

I am suspecting the websockets to be failing. Therefore I tuned of firewalld and selinux, however I am getting the same results.
My docker install is also behind a proxy server, I played with that turning it on and off, same result all over, agent has the same errors and does not connect.
I also played with turning the cattle agent ip on and off: Some result.

Any advice would be appriciated Thanks a lot!

Are you trying to install the agent on the same host as the server? Your command uses 10.0.0.1 for the CATTLE_AGENT_IP and the http address. The CATTLE_AGENT_IP is the IP address of the node you are adding, not the rancher server.

The is exactly what I am trying to do. I am trying to run Rancher Server on the same machine (10.0.0.1) as the agent. By passing trough the docker bridge the pub ip (cattle_agent_ip) of the agent would also be 10.0.0.1 so the server ip and the agent ip would indeed be the same…

Update: I added a second machine to the environment (10.0.0.2) for testing only and I am seeing the same kind of issues when trying to add a host:

docker run -e CATTLE_AGENT_IP=“10.0.0.2” --rm --privileged -v /var/run/docker.sock:/var/run/docker.sock -v /var/lib/rancher:/var/lib/rancher rancher/agent:v1.2.2 http://10.0.0.1/v1/scripts/some-key-generated-by-rancher

time=“2017-06-19T07:44:13Z” level=error msg="invalid character ‘c’ looking for beginning of value"
time=“2017-06-19T07:44:18Z” level=error msg="invalid character ‘i’ in literal false (expecting ‘a’)"
time=“2017-06-19T07:44:23Z” level=error msg="invalid character ‘i’ in literal false (expecting ‘a’)"
time=“2017-06-19T07:44:28Z” level=error msg="invalid character ‘i’ in literal false (expecting ‘a’)"
time=“2017-06-19T07:44:33Z” level=error msg="invalid character ‘i’ in literal false (expecting ‘a’)"
time=“2017-06-19T07:44:38Z” level=error msg="invalid character ‘b’ looking for beginning of value"
time=“2017-06-19T07:44:43Z” level=error msg="invalid character ‘b’ looking for beginning of value"
time=“2017-06-19T07:44:48Z” level=error msg=“invalid character ‘c’ looking for beginning of value"
time=“2017-06-19T07:44:53Z” level=info msg=“Connecting to proxy.” url=”?token="
time=“2017-06-19T07:44:53Z” level=error msg=“Failed to connect to proxy.” error="malformed ws or wss URL"
time=“2017-06-19T07:44:53Z” level=error msg=“malformed ws or wss URL"
time=“2017-06-19T07:44:58Z” level=info msg=“Connecting to proxy.” url=”?token="
time=“2017-06-19T07:44:58Z” level=error msg=“Failed to connect to proxy.” error="malformed ws or wss URL"
time=“2017-06-19T07:44:58Z” level=error msg=“malformed ws or wss URL”

I think it is a problem with your proxy. There is a similar (unanswered) question with the same error related to a proxy.

I taught of that, for that reason, I completely disabled to proxy in docker and restarted the docker service… I did a docker info to make sure my proxy settings are gone. However, I am still seeing the same issue.

To clarify: the proxy we are talking about is a corporate proxy used for internet access only. The Rancher traffic between server and agents is not supposed to pass this proxy. It’s needed for docker to be able to connect to the Docker hub…

Got it sorted. For anyone facing the same issue: There is a problem with RHAT - CENTOS - OEL and the network manager - docker - firewalld.

The below post solved my issue See:
https://github.com/moby/moby/issues/16137#issuecomment-271615192

Very short version:

nmcli connection modify docker0 connection.zone trusted
systemctl stop NetworkManager.service
firewall-cmd --permanent --zone=trusted --change-interface=docker0
systemctl start NetworkManager.service
nmcli connection modify docker0 connection.zone trusted
systemctl restart docker.service