Host recovery after docker restart

Hi!

I’m trying to upgrade docker on my fleet. Started with a single host. Docker upgraded, came back up and the rancher-agent restarted (due to the --restart=always).

Now I have the agent stuck in a:

2015-12-07 08:56:20,705 ERROR docker [140576345189744] [delegate.py:104] Can not call [9ece4162-2638-4696-b828-680799b206f4], container is not running time="2015-12-07T08:56:21Z" level="info" msg="Connecting to proxy." url="ws://rancher.aws.mdx.med:8080/v1/connectbackend?token=" 2015-12-07 08:56:32,716 ERROR docker [140576345189904] [delegate.py:104] Can not call [9ece4162-2638-4696-b828-680799b206f4], container is not running 2015-12-07 08:56:43,793 ERROR docker [140576345190064] [delegate.py:104] Can not call [9ece4162-2638-4696-b828-680799b206f4], container is not running 2015-12-07 08:56:53,869 ERROR docker [140576345189904] [delegate.py:104] Can not call [9ece4162-2638-4696-b828-680799b206f4], container is not running 2015-12-07 08:57:04,361 ERROR docker [140576345190064] [delegate.py:104] Can not call [9ece4162-2638-4696-b828-680799b206f4], container is not running 2015-12-07 08:57:16,865 ERROR docker [140576345188144] [delegate.py:104] Can not call [9ece4162-2638-4696-b828-680799b206f4], container is not running 2015-12-07 08:57:26,879 ERROR docker [140576345188304] [delegate.py:104] Can not call [9ece4162-2638-4696-b828-680799b206f4], container is not running 2015-12-07 08:57:37,855 ERROR docker [140576345188464] [delegate.py:104] Can not call [9ece4162-2638-4696-b828-680799b206f4], container is not running 2015-12-07 08:58:10,962 ERROR docker [140576345189744] [delegate.py:104] Can not call [30c4f970-2cc5-4d9f-84e1-ce83d2616a01], container is not running

When I do a docker inspect on that container, it shows that it’s the network agent (listening on 500 and 4500).

Interesting. It eventually came back up, but took over 20 minutes. I also had a rancher-server restart in that 20 minute period. Not sure if that’s related or not.