Stuck Initializing - 3 containers Rancher 1.2.1

Rancher version: v1.2.1
OS: RHEL 7.2
Docker version: 1.12.3
Setup: Simple test setup (1 server + local MytSQL + 2 nodes)

It comes up almost ok, apart from three things … any ideas ?

  1. The scheduler container doesn’t start, it’s just stuck initializing. The logs show this :-

    18/12/2016 22:31:34time=“2016-12-18T22:31:33Z” level=info msg=“Listening for health checks on 0.0.0.0:80/healthcheck”
    18/12/2016 22:31:34time=“2016-12-18T22:31:33Z” level=info msg=“Connecting to cattle event stream.”
    18/12/2016 22:31:34time=“2016-12-18T22:31:33Z” level=info msg=“Subscribing to metadata changes.”
    18/12/2016 22:31:34time=“2016-12-18T22:31:34Z” level=info msg=“Adding resource pool [instanceReservation] with total 1000000 and used 9 for host e5f1c264-f1ca-4ace-bc3e-927459bd9add”
    18/12/2016 22:31:34time=“2016-12-18T22:31:34Z” level=info msg=“Adding resource pool [cpuReservation] with total 2000 and used 0 for host e5f1c264-f1ca-4ace-bc3e-927459bd9add”
    18/12/2016 22:31:34time=“2016-12-18T22:31:34Z” level=info msg=“Adding resource pool [memoryReservation] with total 3704619008 and used 0 for host e5f1c264-f1ca-4ace-bc3e-927459bd9add”
    18/12/2016 22:31:34time=“2016-12-18T22:31:34Z” level=info msg=“Adding resource pool [storageSize] with total 5818134 and used 0 for host e5f1c264-f1ca-4ace-bc3e-927459bd9add”
    18/12/2016 22:31:34time=“2016-12-18T22:31:34Z” level=info msg=“Initializing event router” workerCount=100
    18/12/2016 22:31:34time=“2016-12-18T22:31:34Z” level=info msg=“Connection established”
    18/12/2016 22:31:34time=“2016-12-18T22:31:34Z” level=info msg=“Starting websocket pings”
    18/12/2016 22:49:59time=“2016-12-18T22:49:59Z” level=info msg=“Adding resource pool [instanceReservation] with total 1000000 and used 4 for host 0cdada1a-1e59-4889-9939-3a2d2329839f”
    18/12/2016 22:49:59time=“2016-12-18T22:49:59Z” level=info msg=“Adding resource pool [cpuReservation] with total 2000 and used 0 for host 0cdada1a-1e59-4889-9939-3a2d2329839f”
    18/12/2016 22:49:59time=“2016-12-18T22:49:59Z” level=info msg=“Adding resource pool [memoryReservation] with total 3704619008 and used 0 for host 0cdada1a-1e59-4889-9939-3a2d2329839f”
    18/12/2016 22:49:59time=“2016-12-18T22:49:59Z” level=info msg=“Adding resource pool [storageSize] with total 5817597 and used 0 for host 0cdada1a-1e59-4889-9939-3a2d2329839f”

  2. I created a global scope load-balancer (linked to an nginx service (port 80:80) that came up no problem – pulled from ECR) but it is also stuck initializing’. It just exposes port 80 (no SSL) just keeping stuff simple for now. Log shows this :-

    18/12/2016 22:39:13time=“2016-12-18T22:39:13Z” level=info msg=“KUBERNETES_URL is not set, skipping init of kubernetes controller”
    18/12/2016 22:39:13time=“2016-12-18T22:39:13Z” level=info msg=“Starting Rancher LB service”
    18/12/2016 22:39:13time=“2016-12-18T22:39:13Z” level=info msg=“LB controller: rancher”
    18/12/2016 22:39:13time=“2016-12-18T22:39:13Z” level=info msg=“LB provider: haproxy”
    18/12/2016 22:39:13time=“2016-12-18T22:39:13Z” level=info msg=“starting rancher controller”
    18/12/2016 22:39:13time=“2016-12-18T22:39:13Z” level=info msg=“Healthcheck handler is listening on :10241”
    18/12/2016 22:39:16time=“2016-12-18T22:39:16Z” level=info msg=" – starting haproxy\n[ALERT] 352/223916 (27) : Starting frontend GLOBAL: cannot bind UNIX socket [/run/haproxy/admin.sock]\n"
    18/12/2016 22:39:16time=“2016-12-18T22:39:16Z” level=info msg=" – reloading haproxy config with the new config changes\n[WARNING] 352/223916 (38) : config : ‘option forwardfor’ ignored for proxy ‘default’ as it requires HTTP mode.\n"

  3. On the first host, the health-check container is running fine, but the one on the second is stuck initializing. The logs show this :-

    18/12/2016 22:50:24time=“2016-12-18T22:50:23Z” level=info msg=“Starting haproxy listener”
    18/12/2016 22:50:28time=“2016-12-18T22:50:28Z” level=info msg=“healthCheck – starting haproxy\n * Starting haproxy haproxy\n …done.\n”
    18/12/2016 22:50:28time=“2016-12-18T22:50:28Z” level=info msg=“Scheduling apply config”
    18/12/2016 22:50:28time=“2016-12-18T22:50:28Z” level=info msg=“healthCheck – reloading haproxy config with the new config changes\n[WARNING] 352/225028 (31) : config : ‘option forwardfor’ ignored for proxy ‘web’ as it requires HTTP mode.\n”
    18/12/2016 22:50:38time=“2016-12-18T22:50:38Z” level=info msg=“Scheduling apply config”
    18/12/2016 22:50:38time=“2016-12-18T22:50:38Z” level=info msg=“healthCheck – no changes in haproxy config\n”
    18/12/2016 22:51:52time=“2016-12-18T22:51:52Z” level=info msg=“Starting haproxy listener”
    18/12/2016 22:51:52time=“2016-12-18T22:51:52Z” level=info msg=“healthCheck – starting haproxy\n[WARNING] 352/225152 (16) : config : ‘option forwardfor’ ignored for proxy ‘web’ as it requires HTTP mode.\n”
    18/12/2016 22:51:52time=“2016-12-18T22:51:52Z” level=info msg=“Scheduling apply config”
    18/12/2016 22:51:52time=“2016-12-18T22:51:52Z” level=info msg=“healthCheck – no changes in haproxy config\n”

Here’s a picture :-

I have the same issue on the healthcheck containers.

But I get the follow log:

level=info msg="Scheduling apply config"
level=info msg="healthCheck – no changes in haproxy config\n"
level=info msg="Scheduling apply config"
level=info msg="healthCheck – no changes in haproxy config\n"
level=info msg="Scheduling apply config"
level=info msg="healthCheck – no changes in haproxy config\n"
level=info msg="Scheduling apply config"
level=info msg="healthCheck – no changes in haproxy config\n"
level=info msg="Scheduling apply config"
level=info msg="healthCheck – no changes in haproxy config\n"
level=info msg="Scheduling apply config"
level=info msg="healthCheck – reloading haproxy config with the new config changes\n[WARNING] 353/010422 (85) : config : ‘option forwardfor’ ignored for proxy ‘web’ as it requires HTTP mode.\n[WARNING] 353/010422 (85) : config : ‘option forwardfor’ ignored for backend ‘cattle-7a8317fd-68e3-4e56-b949-0f61397d294f_eaae9753-2cc4-46db-8572-495f0b454872_1’ as it requires HTTP mode.\n[WARNING] 353/d for backend ‘cattle-7a8317fd-68e3-4e56-b949-0f61397d294f_6da30ff4-3f73-4ff9-81a7-eb8b960eda19_1’ as it requires HTTP mode.\n"
level=info msg="Monitoring 2 backends"
level=info msg=“7a8317fd-68e3-4e56-b949-0f61397d294f_6da30ff4-3f73-4ff9-81a7-eb8b960eda19_1=DOWN”

This is on a clean rancher setup, 2 masters, 5 slaves, external database, 1.21, rancheros7.1

I found out some things, maybe there the same for you:

I’m running the rancher server bending a nginx reverse proxy, that load balanced between the 2 masters.
This agents connected to the external url, this all worked fine, but only ever one health check finished initialising.
When connecting the rancher agents directly to the internal IP address of 1 master, everything went green.

Followed: http://docs.rancher.com/rancher/v1.2/en/installing-rancher/installing-server/basic-ssl-config/
only difference is that my proxy runs on a different server outside rancher.

Rancher have confirmed a problem with networking which should be fixed in v1.2.2 (expected in the next few days).

What problem ? Do you have a link?

@christianhuening Well clearly the problem that this thread is discussing. TBH I’m not especially interested in the broader sense of what might have been identified, only that it might remedy this particular issue.

This has also been raised independently on the issues backlog here:-

https://github.com/rancher/rancher/issues/7132
https://github.com/rancher/rancher/issues/7131
https://github.com/rancher/rancher/issues/7130

Thanks I think I ran into the same issue but with Kubernetes as environment: https://github.com/rancher/rancher/issues/7164

Unfortunately upgrading to 1.2.2 does NOT change the behaviour. Out of ideas.

1 Like

I’m experiencing the same problem.

Setup:

  • 3 server nodes (v1.2.2) in HA mode, LB on registration IP (with a domain)
  • 3 host nodes (currently the same as server nodes), used their respective public IPs for CATTLE_AGENT_IP on agent startup.

We have resolved this issue. It was related to an incorrect configuration of Trend Deep Security Agent on our hosts which prevent docker and iptables playing nicely. I appreciate that this is likely to be specific to my site, but it might help point the way to others suffering the same problem insofar as it wasn’t a Rancher or docker issue per se, but something else that gets in the way of container packets successfully reaching their destination.