Rancher server sometimes does not want to stop

Hi,
sometimes, the rancher server does not want to stop via docker stop command.
The server is not reachable via http when docker stop is executed, but docker ps show the container as running.
the docker kill command does not help…
Any clue to stop it correctly?
Do i need to pass a special command to the s6-svscan exec inside the server container ?

best regards,

Charles.

What version of Rancher server are you running, what Docker version and what OS? It sounds like you hit something similar to this Github issue.

Did anything change in Docker?

Hi,
OS version is ubuntu 15.10 : 4.2.0-27-generic #32-Ubuntu SMP Fri Jan 22 04:49:08 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux.
Docker version was 1.10.
When this bug appears, although service docker stop does not work.
service docker status show me S6 processes remaining, and i cannot kill -9 these processes.
This situation is not systematic.
I’ve upgraded to docker 1.10.1 recently, and the bug does not appears yet.
Note that i run a docker-compose command to run the rancher server, a mysql container, and a mysql backup container (nickbreen/docker-mysql-backup-cron).
Is there a special command to pass to the rancher server container ?

hope these details will help to debug,

best regards,
Charles.

O, I’ve got this issue now (with docker 1.10.1).
Just before, i’ve tried to register the rancher-agent to the same host than the rancher server, with the -e CATTLE_AGENT_IP add-on to the registration command.
And i cannot stop the rancher server container and the mysql one (just before, stop and start were succesful).
Rancher server logs contains
level=error msg="Unable to create EventRouter" Err="Get http://x.x.x.x.:myport/v1: dial tcp x.x.x.x:myport: i/o timeout"
Container is not stopped, but UI is not reachable too.
I’ve launched the docker stop command with both container ids (server and mysql).
Hope it helps,

Charles.

sudo service docker stop does not work.

sudo service docker status returns :
$ sudo service docker status
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: deactivating (final-sigterm) since Thu 2016-02-25 23:19:52 CET; 33s ago
Docs: https://docs.docker.com
Process: 985 ExecStart=/usr/bin/docker daemon -H fd:// (code=exited, status=0/SUCCESS)
Main PID: 985 (code=exited, status=0/SUCCESS)
CGroup: /system.slice/docker.service
├─11508 s6-supervise cattle
└─11509 s6-supervise mysql

Feb 25 22:44:58 docker[985]: time="2016-02-25T22:44:58.858801054+01:00" level=info msg="Container 9a85bb5f5c74805d6b084a6e3881fa6648e50bcb0a273e397356a42c5bf0a17a fai...ng the force"
Feb 25 22:45:08  docker[985]: time="2016-02-25T22:45:08.859009541+01:00" level=info msg="Container 9a85bb5f5c74 failed to exit within 10 seconds of kill - trying direct SIGKILL"
Feb 25 22:47:56  docker[985]: time="2016-02-25T22:47:56.825795928+01:00" level=info msg="Container 9a85bb5f5c74805d6b084a6e3881fa6648e50bcb0a273e397356a42c5bf0a17a fai...ng the force"
Feb 25 22:48:06  docker[985]: time="2016-02-25T22:48:06.825979287+01:00" level=info msg="Container 9a85bb5f5c74 failed to exit within 10 seconds of kill - trying direct SIGKILL"
Feb 25 22:52:00  docker[985]: time="2016-02-25T22:52:00.929454567+01:00" level=info msg="Container 9a85bb5f5c74805d6b084a6e3881fa6648e50bcb0a273e397356a42c5bf0a17a fai...ng the force"
Feb 25 22:52:10 docker[985]: time="2016-02-25T22:52:10.929625993+01:00" level=info msg="Container 9a85bb5f5c74 failed to exit within 10 seconds of kill - trying direct SIGKILL"
Feb 25 23:19:52 systemd[1]: Stopping Docker Application Container Engine...
Feb 25 23:19:52  docker[985]: time="2016-02-25T23:19:52.092738149+01:00" level=info msg="Processing signal 'terminated'"
Feb 25 23:20:02 docker[985]: time="2016-02-25T23:20:02.093105743+01:00" level=info msg="Container 9a85bb5f5c74805d6b084a6e3881fa6648e50bcb0a273e397356a42c5bf0a17a fai...ng the force"
Feb 25 23:20:07  docker[985]: time="2016-02-25T23:20:07.092931819+01:00" level=error msg="Force shutdown daemon"
Hint: Some lines were ellipsized, use -l to show in full.

There’s really nothing special about the rancher/server container. If you can not kill -9 the processes in the container then there is a more serious problem with your host OS and or Docker. Is this easily reproducible on another host or is it specific to one setup? Do you have other Docker containers running on this host?

i’ve got 3 containers.
one for backup (which i can stop easily), one for server, and one for mysql…
The only solution is to restart from the provider console (reboot does not work…).
when i reboot this way, server and mysql containers are restarted (i’ve not yet configured it for the backup container).
i’ve succesfully stopped and restarted server rancher.
i’ve retried and i’m stuck…(i’ve not yet registered any agent).

rancher server logs highlight these errors:

time=“2016-02-25T22:53:11Z” level=fatal msg=“Unable to create event router” error="Get http://myip:myport/v1: dial tcp myip:myport: i/o timeout"
time=“2016-02-25T22:53:11Z” level=error msg=“Unable to create EventRouter” Err=“Get http://myip:myport/v1: dial tcp myip:myport: i/o timeout”

Which command should i execute to have more informations?

Charles.

the rancher version is v0.59.0.
i will delete containers (server and mysql) in few minutes, and reinstall a server with v0.59.1.

stay tuned,

Charles.

I’ve tested with rancher v0.59.1 with the same result… :cry:
i can stop all containers(backup and mysql) except the rancher server one.

Charles.