Scheduler and Healthcheck stuck Initializing v1.2.2

Fraser_Goffin · December 23, 2016, 6:20pm

Also raised on the Rancher Issues at github (https://github.com/rancher/rancher/issues/7192)

Rancher Version:

1.2.2

Docker Version:

1.12.3 (also tried with 1.10.3 - same behaviour)

OS and where are the hosts located? (cloud, bare metal, etc):

RHEL 7.3
AWS

Setup Details: (single node rancher vs. HA rancher, internal DB vs. external DB)

Single Rancher sever + 2 hosts (simple set-up to test this problem)

Environment Type: (Cattle/Kubernetes/Swarm/Mesos)

Cattle

Steps to Reproduce:

1 * Rancher SERVER
2 * Rancher HOST

Spin up instances from AMI

SSH onto Rancher HA SERVER :-

mkdir -p /home/ec2-user/rancher/mysql
chmod -R 0755 /home/ec2-user/rancher

sudo docker run -d -v /home/ec2-user/rancher/mysql:/var/lib/mysql --restart=unless-stopped -p 8080:8080 rancher/server

Browse to the IP:8080 of the HA server

REGISTER HOSTS (Custom)

Spin up a simple stack (nginx) :
Create a global load balancer to register nginx service

Results:

Health-check and scheduler containers stuck in ‘Initializing’ state.
Global LB stuck in ‘Initializing’ state

Expected:

Health-check and scheduler containers should start normally.
Glocal LB should start normally

Clearly a number of users are experiencing this issue. Here are the issues that I’ve seen recently registered (and have commented on some)

https://github.com/rancher/rancher/issues/7132
https://github.com/rancher/rancher/issues/7131
https://github.com/rancher/rancher/issues/7130
https://github.com/rancher/rancher/issues/7156
https://github.com/rancher/rancher/issues/7171
https://github.com/rancher/rancher/issues/7190

I understand that this was noted by Rancher and some networking fixes have been introduced in v1.2.2 (I can see the network-manager version has gone up to 0.2.18 and dns to v0.11.0) but the behaviour is the same as it was with Rancher v1.2.1.

I can ping between the ip-sec containers sucessfully, but NOT between ip-sec and any other container, not any other pair of containers.

ICMP is enabled on the hosts and server

Have tried with firewalld enabled and disabled (default)

This is an absolute blocker for us.

hwaastad · December 24, 2016, 10:44am

Hi,try restarting metaservice on all hosts.
That worked for me.

/hw

Fraser_Goffin · January 7, 2017, 7:07pm

In our case the cause was packets being blocked by the instance hardening, speicifically the Trend Deep Security Agent. Fortunately it is possible to configure DSA to play nice with docker.

Topic		Replies	Views
Container stuck Initializing despite passing health check Rancher 1.x	10	5558	June 7, 2016
Health checks not initializing properly? Rancher 1.x	22	5910	September 5, 2017
Stuck Initializing - 3 containers Rancher 1.2.1	9	3927	January 8, 2017
Debugging containers in "Initialzing" state w/health checks Rancher 1.x	2	1344	July 7, 2016
Health checks are stuck in the initializing state after 0.9.0 RancherOS upgrade	0	785	March 28, 2017

Scheduler and Healthcheck stuck Initializing v1.2.2

Related topics