Containers like Route53, Load Balancer stuck Initializing

Hi All,

I have been using Rancher heavily since 0.59 and now after moving upto 0.63 and now 1.0. I have come across an issue where some of the Rancher service containers are stuck in Initializing state.
They all seem to function correctly but never move to Running.

I thought it was only the Load Balancer and Convoy Gluster containers having issues but I configured a Route53 service from the Rancher Catalog and got the same results.

Please forgive me if there is a topic open for this already or a Github issue submitted. I have searched around and cannot see the same issue as I am having. Most show reports show the container not working.

Where should I look for debugging info for me to post here for help?
Any help or guidance would be greatly appreciated.

Have you confirmed if your cross host networking is still working? You can check by exec-ing into a network agent and pinging another network agent’s IP.

HI Denise,

Yep I exec-ed into the 3 rancher agents across the 3 hosts we have and they can all ping each other.
I assume the “network agents IP” is referred to as the host IP shown in the Infrastructure Hosts list in the Rancher UI?

They can also ping the IP address in the “Standalone Containers” -> Network Agent 10.42.x.x address as well.

Looking further into it, it seems if I try and create a Route53 from the catalog in a new environment it works with any issues. (Noting the new environment only had one host in it)
So I tried re-creating the stack/service with the same docker/rancher compose files but this time I modify it to force install on a different host ie via the label “io.rancher.scheduler.affinity:host_label”. It rendered the same result of Initializing.
My upgrade path wasn’t the greatest from version 0.59.1.
Basic list of path

  1. 0.59.1 ended up running out of disk space due to logging data getting bloated.
  2. Exported the database to a fresh MySQL server on a new host
  3. Loaded up 0.63 on a new host pointing to the new MySQL database.
  4. Removed and re-created Rancher Agents/config to point to new Rancher server.
  5. Rancher Agents ended up in a re-create loop.
  6. managed to fix and purge all old containers I had, I had a few when trying out GlusterFS.
  7. Loadbalancer service and Convoy Gluster was in a Active state at version 0.63
  8. Upgraded to rancher 1.0
  9. Loadbalancer / Convoy Gluster are now showing Initializing.
    One thing to note as of 0.63 I could successfully use an API to add new service links to the LoadBalancer.
    Now I can still do so, but the config is only applied it I go into the LB edit button and select Save
  10. Recently added Route53 to find the same issue.

It seems all other containers I create are all ok. It only affects Rancher services like LoadBalancer/Convoy and Route53.

Is there a log or data table that shows what its waiting for in “Initializing”?

Can you provide the container logs for the containers that are stuck in initializing?

Those will typically indicate what the issue is.

Certainly,

I have listed below the LoadBalancer and Route53 container logs.

Please note I have changed the rancher server address to 10.X.X.X to mask our addressing.

This one is the load first balancer container.

06/04/2016 14:17:10INFO: Downloading agent http://10.X.X.X:8080/v1/configcontent/configscripts
06/04/2016 14:17:10INFO: Updating configscripts
06/04/2016 14:17:10INFO: Downloading http://10.X.X.X:8080/v1//configcontent//configscripts current=
06/04/2016 14:17:10INFO: Running /var/lib/cattle/download/configscripts/configscripts-1-f0f3fb2e1110b5ada7c441705981f93a480313a324294321cff467f0c3e12319/apply.sh
06/04/2016 14:17:10INFO: Sending configscripts applied 1-f0f3fb2e1110b5ada7c441705981f93a480313a324294321cff467f0c3e12319
06/04/2016 14:17:10INFO: Updating agent-instance-startup
06/04/2016 14:17:10INFO: Downloading http://10.X.X.X:8080/v1//configcontent//agent-instance-startup current=
06/04/2016 14:17:10INFO: Running /var/lib/cattle/download/agent-instance-startup/agent-instance-startup-1-bf3ac217ed9ff3496ef6fbd5fc0934bddeda03aab1b1069201d9f534feb737b7/apply.sh
06/04/2016 14:17:10INFO: Updating services
06/04/2016 14:17:10INFO: Downloading http://10.X.X.X:8080/v1//configcontent//services current=
06/04/2016 14:17:10INFO: Running /var/lib/cattle/download/services/services-1-061405f3edd960bfdfe1cfb8447be40eab5b4b608731608e224cc51c5dc30b91/apply.sh
06/04/2016 14:17:10INFO: HOME -> ./
06/04/2016 14:17:10INFO: HOME -> ./services
06/04/2016 14:17:10INFO: Sending services applied 1-061405f3edd960bfdfe1cfb8447be40eab5b4b608731608e224cc51c5dc30b91
06/04/2016 14:17:10INFO: Getting agent-instance-scripts
06/04/2016 14:17:10INFO: Updating agent-instance-scripts
06/04/2016 14:17:10INFO: Downloading http://10.X.X.X:8080/v1//configcontent//agent-instance-scripts current=
06/04/2016 14:17:10INFO: Running /var/lib/cattle/download/agent-instance-scripts/agent-instance-scripts-1-4b5124bd74cd423f98d57550b481ec77ec3a7135c6a650886ab95c043305d642/apply.sh
06/04/2016 14:17:10INFO: HOME -> ./
06/04/2016 14:17:10INFO: HOME -> ./events/
06/04/2016 14:17:10INFO: HOME -> ./events/config.update
06/04/2016 14:17:10INFO: HOME -> ./events/ping
06/04/2016 14:17:10INFO: Sending agent-instance-scripts applied 1-4b5124bd74cd423f98d57550b481ec77ec3a7135c6a650886ab95c043305d642
06/04/2016 14:17:10INFO: Getting monit
06/04/2016 14:17:10INFO: Updating monit
06/04/2016 14:17:10INFO: Downloading http://10.X.X.X:8080/v1//configcontent//monit current=
06/04/2016 14:17:10INFO: Running /var/lib/cattle/download/monit/monit-1-c4113ae48035df162ff89a5d37af1545f002ee54e044535e42395bda7a29a953/apply.sh
06/04/2016 14:17:10INFO: ROOT -> ./
06/04/2016 14:17:10INFO: ROOT -> ./etc/
06/04/2016 14:17:10INFO: ROOT -> ./etc/monit/
06/04/2016 14:17:10INFO: ROOT -> ./etc/monit/conf.d/
06/04/2016 14:17:10INFO: ROOT -> ./etc/monit/conf.d/logrotate
06/04/2016 14:17:10INFO: ROOT -> ./etc/monit/monitrc
06/04/2016 14:17:10INFO: ROOT -> ./etc/logrotate.d/
06/04/2016 14:17:10INFO: ROOT -> ./etc/logrotate.d/rancher-logs
06/04/2016 14:17:10INFO: Sending monit applied 1-c4113ae48035df162ff89a5d37af1545f002ee54e044535e42395bda7a29a953
06/04/2016 14:17:11INFO: Getting haproxy
06/04/2016 14:17:11INFO: Updating haproxy
06/04/2016 14:17:11INFO: Downloading http://10.X.X.X:8080/v1//configcontent//haproxy current=
06/04/2016 14:17:11INFO: Running /var/lib/cattle/download/haproxy/haproxy-2-a5eac3965952846cbd39c610ae44d58f5b54450bd46bdfba35c57dda8edfaab0/apply.sh
06/04/2016 14:17:11INFO: ROOT -> ./
06/04/2016 14:17:11INFO: ROOT -> ./etc/
06/04/2016 14:17:11INFO: ROOT -> ./etc/monit/
06/04/2016 14:17:11INFO: ROOT -> ./etc/monit/conf.d/
06/04/2016 14:17:11INFO: ROOT -> ./etc/monit/conf.d/haproxy
06/04/2016 14:17:11INFO: ROOT -> ./etc/haproxy/
06/04/2016 14:17:11INFO: ROOT -> ./etc/haproxy/haproxy.cfg
06/04/2016 14:17:11INFO: ROOT -> ./etc/haproxy/certs/
06/04/2016 14:17:11INFO: ROOT -> ./etc/haproxy/certs/default.pem
06/04/2016 14:17:11INFO: ROOT -> ./etc/haproxy/certs/certs.pem
06/04/2016 14:17:11INFO: ROOT -> ./etc/default/
06/04/2016 14:17:11INFO: ROOT -> ./etc/default/haproxy
06/04/2016 14:17:11INFO: Sending haproxy applied 2-a5eac3965952846cbd39c610ae44d58f5b54450bd46bdfba35c57dda8edfaab0
06/04/2016 14:17:11INFO: HOME -> ./
06/04/2016 14:17:11INFO: HOME -> ./etc/
06/04/2016 14:17:11INFO: HOME -> ./etc/cattle/
06/04/2016 14:17:11INFO: HOME -> ./etc/cattle/startup-env
06/04/2016 14:17:11INFO: ROOT -> ./
06/04/2016 14:17:11INFO: ROOT -> ./etc/
06/04/2016 14:17:11INFO: ROOT -> ./etc/init.d/
06/04/2016 14:17:11INFO: ROOT -> ./etc/init.d/agent-instance-startup
06/04/2016 14:17:11INFO: Sending agent-instance-startup applied 1-bf3ac217ed9ff3496ef6fbd5fc0934bddeda03aab1b1069201d9f534feb737b7
06/04/2016 14:17:11monit: generated unique Monit id eec4438824f4e81f828df5a4c032d1b8 and stored to '/var/lib/monit/id’
06/04/2016 14:17:11Starting monit daemon with http interface at [localhost:2812]

The second loadbalancer container

06/04/2016 14:17:09INFO: Downloading agent http://10.X.X.X:8080/v1/configcontent/configscripts
06/04/2016 14:17:09INFO: Updating configscripts
06/04/2016 14:17:09INFO: Downloading http://10.X.X.X:8080/v1//configcontent//configscripts current=
06/04/2016 14:17:09INFO: Running /var/lib/cattle/download/configscripts/configscripts-1-f0f3fb2e1110b5ada7c441705981f93a480313a324294321cff467f0c3e12319/apply.sh
06/04/2016 14:17:09INFO: Sending configscripts applied 1-f0f3fb2e1110b5ada7c441705981f93a480313a324294321cff467f0c3e12319
06/04/2016 14:17:09INFO: Updating agent-instance-startup
06/04/2016 14:17:09INFO: Downloading http://10.X.X.X:8080/v1//configcontent//agent-instance-startup current=
06/04/2016 14:17:09INFO: Running /var/lib/cattle/download/agent-instance-startup/agent-instance-startup-1-bf3ac217ed9ff3496ef6fbd5fc0934bddeda03aab1b1069201d9f534feb737b7/apply.sh
06/04/2016 14:17:09INFO: Updating services
06/04/2016 14:17:09INFO: Downloading http://10.X.X.X:8080/v1//configcontent//services current=
06/04/2016 14:17:09INFO: Running /var/lib/cattle/download/services/services-1-061405f3edd960bfdfe1cfb8447be40eab5b4b608731608e224cc51c5dc30b91/apply.sh
06/04/2016 14:17:09INFO: HOME -> ./
06/04/2016 14:17:09INFO: HOME -> ./services
06/04/2016 14:17:09INFO: Sending services applied 1-061405f3edd960bfdfe1cfb8447be40eab5b4b608731608e224cc51c5dc30b91
06/04/2016 14:17:10INFO: Getting agent-instance-scripts
06/04/2016 14:17:10INFO: Updating agent-instance-scripts
06/04/2016 14:17:10INFO: Downloading http://10.X.X.X:8080/v1//configcontent//agent-instance-scripts current=
06/04/2016 14:17:10INFO: Running /var/lib/cattle/download/agent-instance-scripts/agent-instance-scripts-1-4b5124bd74cd423f98d57550b481ec77ec3a7135c6a650886ab95c043305d642/apply.sh
06/04/2016 14:17:10INFO: HOME -> ./
06/04/2016 14:17:10INFO: HOME -> ./events/
06/04/2016 14:17:10INFO: HOME -> ./events/ping
06/04/2016 14:17:10INFO: HOME -> ./events/config.update
06/04/2016 14:17:10INFO: Sending agent-instance-scripts applied 1-4b5124bd74cd423f98d57550b481ec77ec3a7135c6a650886ab95c043305d642
06/04/2016 14:17:10INFO: Getting monit
06/04/2016 14:17:10INFO: Updating monit
06/04/2016 14:17:10INFO: Downloading http://10.X.X.X:8080/v1//configcontent//monit current=
06/04/2016 14:17:10INFO: Running /var/lib/cattle/download/monit/monit-1-c4113ae48035df162ff89a5d37af1545f002ee54e044535e42395bda7a29a953/apply.sh
06/04/2016 14:17:10INFO: ROOT -> ./
06/04/2016 14:17:10INFO: ROOT -> ./etc/
06/04/2016 14:17:10INFO: ROOT -> ./etc/logrotate.d/
06/04/2016 14:17:10INFO: ROOT -> ./etc/logrotate.d/rancher-logs
06/04/2016 14:17:10INFO: ROOT -> ./etc/monit/
06/04/2016 14:17:10INFO: ROOT -> ./etc/monit/monitrc
06/04/2016 14:17:10INFO: ROOT -> ./etc/monit/conf.d/
06/04/2016 14:17:10INFO: ROOT -> ./etc/monit/conf.d/logrotate
06/04/2016 14:17:10INFO: Sending monit applied 1-c4113ae48035df162ff89a5d37af1545f002ee54e044535e42395bda7a29a953
06/04/2016 14:17:10INFO: Getting haproxy
06/04/2016 14:17:10INFO: Updating haproxy
06/04/2016 14:17:10INFO: Downloading http://10.X.X.X:8080/v1//configcontent//haproxy current=
06/04/2016 14:17:10INFO: Running /var/lib/cattle/download/haproxy/haproxy-2-a5eac3965952846cbd39c610ae44d58f5b54450bd46bdfba35c57dda8edfaab0/apply.sh
06/04/2016 14:17:10INFO: ROOT -> ./
06/04/2016 14:17:10INFO: ROOT -> ./etc/
06/04/2016 14:17:10INFO: ROOT -> ./etc/default/
06/04/2016 14:17:10INFO: ROOT -> ./etc/default/haproxy
06/04/2016 14:17:10INFO: ROOT -> ./etc/haproxy/
06/04/2016 14:17:10INFO: ROOT -> ./etc/haproxy/haproxy.cfg
06/04/2016 14:17:10INFO: ROOT -> ./etc/haproxy/certs/
06/04/2016 14:17:10INFO: ROOT -> ./etc/haproxy/certs/certs.pem
06/04/2016 14:17:10INFO: ROOT -> ./etc/haproxy/certs/default.pem
06/04/2016 14:17:10INFO: ROOT -> ./etc/monit/
06/04/2016 14:17:10INFO: ROOT -> ./etc/monit/conf.d/
06/04/2016 14:17:10INFO: ROOT -> ./etc/monit/conf.d/haproxy
06/04/2016 14:17:10INFO: Sending haproxy applied 2-a5eac3965952846cbd39c610ae44d58f5b54450bd46bdfba35c57dda8edfaab0
06/04/2016 14:17:10INFO: HOME -> ./
06/04/2016 14:17:10INFO: HOME -> ./etc/
06/04/2016 14:17:10INFO: HOME -> ./etc/cattle/
06/04/2016 14:17:10INFO: HOME -> ./etc/cattle/startup-env
06/04/2016 14:17:10INFO: ROOT -> ./
06/04/2016 14:17:10INFO: ROOT -> ./etc/
06/04/2016 14:17:10INFO: ROOT -> ./etc/init.d/
06/04/2016 14:17:10INFO: ROOT -> ./etc/init.d/agent-instance-startup
06/04/2016 14:17:10INFO: Sending agent-instance-startup applied 1-bf3ac217ed9ff3496ef6fbd5fc0934bddeda03aab1b1069201d9f534feb737b7
06/04/2016 14:17:11monit: generated unique Monit id 1a30c16cee071ea4a0b17191a5e87e6d and stored to '/var/lib/monit/id’
06/04/2016 14:17:11Starting monit daemon with http interface at [localhost:2812]

The Route53 container log

11/04/2016 14:40:02time=“2016-04-11T02:40:02Z” level=info msg="CLOUDFLARE_EMAIL is not set, skipping init of CloudFlare provider"
11/04/2016 14:40:02time=“2016-04-11T02:40:02Z” level=info msg="DNSIMPLE_TOKEN is not set, skipping init of DNSimple provider"
11/04/2016 14:40:02time=“2016-04-11T02:40:02Z” level=info msg="GANDI_APIKEY is not set, skipping init of Gandi provider"
11/04/2016 14:40:02time=“2016-04-11T02:40:02Z” level=info msg="POINTHQ_TOKEN is not set, skipping init of PointHQ provider"
11/04/2016 14:40:04time=“2016-04-11T02:40:04Z” level=info msg="Configured Route53 with hosted zone “myhosted.” in region “us-west-2” "
11/04/2016 14:40:04time=“2016-04-11T02:40:04Z” level=info msg="Starting Rancher External DNS service"
11/04/2016 14:40:04time=“2016-04-11T02:40:04Z” level=info msg="Powered by Route53"
11/04/2016 14:40:04time=“2016-04-11T02:40:04Z” level=info msg=“Healthcheck handler is listening on :1000”

It looks like there are no definitive issues in your containers.

For the containers stuck in initializing, can you look in the API of each container?

Under the links section of the API, there is a heatlhcheckInstanceHostMap link. Please click on it. Can you provide the healthState of it?

Hi,

I’m sure you got this figured out by now but I just encountered this so for those who have this problem in the future:

We are using AWS as well. I decided to tighten up on my ports so I made the communication in the Security Groups only between the Rancher Server and the Rancher Agent on Each Host WITHIN AWS. This made it stuck in initialize until I set the rancher agent ports UDP500 and UPD4500 back to open (0.0.0.0). This obviously makes sense for a cross platform agent. :joy:

1 Like