Scheduler and Healthcheck stuck Initializing v1.2.2

Also raised on the Rancher Issues at github (https://github.com/rancher/rancher/issues/7192)

Rancher Version:

1.2.2

Docker Version:

1.12.3 (also tried with 1.10.3 - same behaviour)

OS and where are the hosts located? (cloud, bare metal, etc):

RHEL 7.3
AWS

Setup Details: (single node rancher vs. HA rancher, internal DB vs. external DB)

Single Rancher sever + 2 hosts (simple set-up to test this problem)

Environment Type: (Cattle/Kubernetes/Swarm/Mesos)

Cattle

Steps to Reproduce:

1 * Rancher SERVER
2 * Rancher HOST

Spin up instances from AMI

SSH onto Rancher HA SERVER :-

mkdir -p /home/ec2-user/rancher/mysql
chmod -R 0755 /home/ec2-user/rancher

sudo docker run -d -v /home/ec2-user/rancher/mysql:/var/lib/mysql --restart=unless-stopped -p 8080:8080 rancher/server

Browse to the IP:8080 of the HA server

REGISTER HOSTS (Custom)

Spin up a simple stack (nginx) :
Create a global load balancer to register nginx service

Results:

Health-check and scheduler containers stuck in ‘Initializing’ state.
Global LB stuck in ‘Initializing’ state

Expected:

Health-check and scheduler containers should start normally.
Glocal LB should start normally

Clearly a number of users are experiencing this issue. Here are the issues that I’ve seen recently registered (and have commented on some)

https://github.com/rancher/rancher/issues/7132
https://github.com/rancher/rancher/issues/7131
https://github.com/rancher/rancher/issues/7130
https://github.com/rancher/rancher/issues/7156
https://github.com/rancher/rancher/issues/7171
https://github.com/rancher/rancher/issues/7190

I understand that this was noted by Rancher and some networking fixes have been introduced in v1.2.2 (I can see the network-manager version has gone up to 0.2.18 and dns to v0.11.0) but the behaviour is the same as it was with Rancher v1.2.1.

I can ping between the ip-sec containers sucessfully, but NOT between ip-sec and any other container, not any other pair of containers.

ICMP is enabled on the hosts and server

Have tried with firewalld enabled and disabled (default)

This is an absolute blocker for us.

Hi,try restarting metaservice on all hosts.
That worked for me.

/hw

1 Like

In our case the cause was packets being blocked by the instance hardening, speicifically the Trend Deep Security Agent. Fortunately it is possible to configure DSA to play nice with docker.