Setting up Swarm hangs on "Starting services"

Hi,

I’ve created a new environment (Swarm), added three hosts on AWS, but it hangs on “Starting services”.
Infrastructure > Hosts shows yellow ! next to swarmkit containers.

Two hosts have “swarm=wait_leader” in their description, while one has “swarm=manager”.

Logs for “wait_leader” ones:

1/31/2017 9:27:07 PMtime="2017-01-31T20:27:07Z" level=info msg="Listening on port: 2378"
1/31/2017 9:27:07 PMSet host label swarm=wait_leader
1/31/2017 9:27:35 PMMetadata not ready
1/31/2017 9:27:36 PMtime="2017-01-31T20:27:36Z" level=info msg="Listening on port: 2378"
1/31/2017 9:27:36 PMSet host label swarm=wait_leader
1/31/2017 9:28:05 PMMetadata not ready

Logs for “manager”:

1/31/2017 9:28:35 PM1 of 1 manager(s) reachable, 0 worker(s) active
1/31/2017 9:28:35 PMNo active workers present for promotion, add more nodes to enable reconciliation.
1/31/2017 9:29:05 PM1 of 1 manager(s) reachable, 0 worker(s) active
1/31/2017 9:29:05 PMNo active workers present for promotion, add more nodes to enable reconciliation.

All 3 hosts have rancher-machine security group on them.

Swarm > Infrastructure shows that there’s update available for Swarm, but it wouldn’t complete (it said that it’s waiting for a healthy container, which never happened - I’m guessing it’s the yellow ! thingy), so I rolled back.

How do I fix swarmkit issue?

Thanks!

1 Like

I’m still having troubles with this.

I’ve created another Rancher server in another AWS VPC, this time only two hosts that also have rancher-machine security group, but swarmkit is still in Initializing state:

Logs are the same too - the one marked as manager:

2/2/2017 9:36:03 AM1 of 1 manager(s) reachable, 0 worker(s) active
2/2/2017 9:36:03 AMNo active workers present for promotion, add more nodes to enable reconciliation.

and wait_leader one:

2/2/2017 9:35:21 AMtime="2017-02-02T08:35:21Z" level=info msg="Listening on port: 2378"
2/2/2017 9:35:21 AMSet host label swarm=wait_leader

edit: Ok I’ve just realized that it says Listening on port: 2378, but rancher-machine security group doesn’t have that port, it has 2376?

I’ve added 2378 and swarmkit in manager became Active, but the wait_leader is still Initializing

edit2: I added another security group which allows all internal traffic and everything is green now. rancher-machine is flawed, didn’t figure out how exactly.

edit3: Hm, all hosts are showing swarm=manager now…is that ok? No workers? Editing host label swarm = worker gets reset to swarm = manager. Do I want all managers, I don’t know?

The solution from edit3: up there worked for a little while, but then I would get “Disconnected” on one of the hosts…“Starting services” would reappear, and swarm stack would turn red.

Logs from one of the swamkit-mon containers:

2/4/2017 10:36:50 AM/opt/rancher/run.sh: line 244: + : syntax error: operand expected (error token is "+ ")
2/4/2017 10:37:03 AMMetadata not ready
2/4/2017 10:37:04 AMtime="2017-02-04T09:37:04Z" level=info msg="Listening on port: 2378"
2/4/2017 10:37:04 AMjq: error (at <stdin>:1): Cannot index string with string "Description"
2/4/2017 10:37:04 AMjq: error (at <stdin>:1): Cannot index string with string "Spec"
2/4/2017 10:37:04 AMjq: error (at <stdin>:1): Cannot index string with string "Spec"
2/4/2017 10:37:04 AMjq: error (at <stdin>:1): Cannot index string with string "Spec"
2/4/2017 10:37:04 AM/opt/rancher/run.sh: line 244: + : syntax error: operand expected (error token is "+ ")
2/4/2017 10:37:10 AMtime="2017-02-04T09:37:10Z" level=info msg="Listening on port: 2378"
2/4/2017 10:37:10 AMjq: error (at <stdin>:1): Cannot index string with string "Description"
2/4/2017 10:37:10 AMjq: error (at <stdin>:1): Cannot index string with string "Spec"
2/4/2017 10:37:10 AMjq: error (at <stdin>:1): Cannot index string with string "Spec"
2/4/2017 10:37:11 AMjq: error (at <stdin>:1): Cannot index string with string "Spec"
2/4/2017 10:37:11 AM/opt/rancher/run.sh: line 244: + : syntax error: operand expected (error token is "+ ")
2/4/2017 10:37:29 AMtime="2017-02-04T09:37:29Z" level=info msg="Listening on port: 2378"
2/4/2017 10:37:29 AMjq: error (at <stdin>:1): Cannot index string with string "Description"
2/4/2017 10:37:29 AMjq: error (at <stdin>:1): Cannot index string with string "Spec"
2/4/2017 10:37:29 AMjq: error (at <stdin>:1): Cannot index string with string "Spec"
2/4/2017 10:37:29 AMjq: error (at <stdin>:1): Cannot index string with string "Spec"

Then I’d go and restart EC2 and after some time it would get sorted out. Later it would fall apart again (with me not touching any settings/services/anything).

Me too. Did you found somesolution?

Nope, kinda gave up…

Same issue…it is stuck at “Starting Services”

Diagnosing Swarm Issues fixed the issue for me. Turns out, apt-get install docker.io install a docker version that doesn’t support docker swarm mode. Following the instruction found at https://docs.docker.com/engine/installation/linux/ubuntu/ fixed it for me

I don’t think that’s the issue here, because it did work at some point. It comes and goes, mostly goes :slight_smile: