Agent reconnect after upgrade fails

After upgrading from 1.0.0 to 1.0.1 two of three agents fails to reconnect. The agents failing are the two ranger spanned aws instances. An further instance running on aws with custom installed agent worked …

LogFile:

13:14:36.280 [main] INFO ConsoleStatus - [1/6] [0ms] [1ms] Loading config-bootstrap
13:14:36.651 [main] INFO ConsoleStatus - [2/6] [374ms] [368ms] Loading base-config
13:14:37.197 [main] INFO ConsoleStatus - [3/6] [920ms] [546ms] Loading config
13:14:37.205 [main] INFO ConsoleStatus - [4/6] [928ms] [7ms] Starting config-bootstrap
13:14:37.667 [main] INFO ConsoleStatus - [5/6] [1390ms] [462ms] Starting base-config
13:14:37.667 [main] INFO ConsoleStatus - [6/6] [1390ms] [0ms] Starting config
2016-04-21 13:14:38,009 INFO [main] [ConsoleStatus] [1/32] [0ms] [0ms] Loading bootstrap
2016-04-21 13:14:38,198 INFO [main] [ConsoleStatus] [2/32] [189ms] [189ms] Loading config-defaults
2016-04-21 13:14:44,933 INFO [main] [ConsoleStatus] [3/32] [6924ms] [6735ms] Loading system
2016-04-21 13:14:45,014 INFO [main] [ConsoleStatus] [4/32] [7005ms] [81ms] Loading defaults
2016-04-21 13:14:45,213 INFO [main] [ConsoleStatus] [5/32] [7204ms] [199ms] Loading types
2016-04-21 13:14:49,989 INFO [main] [ConsoleStatus] [6/32] [11980ms] [4776ms] Loading system-services
2016-04-21 13:14:50,251 INFO [main] [ConsoleStatus] [7/32] [12242ms] [262ms] Loading agent-server
2016-04-21 13:14:50,654 INFO [main] [ConsoleStatus] [8/32] [12645ms] [403ms] Loading allocator-server
2016-04-21 13:14:52,346 INFO [main] [ConsoleStatus] [9/32] [14337ms] [1692ms] Loading api-server
2016-04-21 13:14:54,936 INFO [main] [ConsoleStatus] [10/32] [16927ms] [2590ms] Loading iaas-api
2016-04-21 13:14:55,039 INFO [main] [ConsoleStatus] [11/32] [17030ms] [103ms] Loading archaius
2016-04-21 13:14:55,310 INFO [main] [ConsoleStatus] [12/32] [17301ms] [271ms] Loading core-model
2016-04-21 13:14:55,354 INFO [main] [ConsoleStatus] [13/32] [17345ms] [44ms] Loading core-object-defaults
2016-04-21 13:14:55,395 INFO [main] [ConsoleStatus] [14/32] [17386ms] [41ms] Loading encryption
2016-04-21 13:14:58,168 INFO [main] [ConsoleStatus] [15/32] [20159ms] [2773ms] Loading process
2016-04-21 13:14:58,211 INFO [main] [ConsoleStatus] [16/32] [20202ms] [43ms] Loading redis
2016-04-21 13:14:58,263 INFO [main] [ConsoleStatus] [17/32] [20254ms] [52ms] Starting bootstrap
2016-04-21 13:14:58,263 INFO [main] [ConsoleStatus] [18/32] [20254ms] [0ms] Starting config-defaults
2016-04-21 13:14:58,263 INFO [main] [ConsoleStatus] [19/32] [20254ms] [0ms] Starting system
2016-04-21 13:14:58,263 INFO [main] [ConsoleStatus] [20/32] [20254ms] [0ms] Starting defaults
2016-04-21 13:14:58,263 INFO [main] [ConsoleStatus] [21/32] [20254ms] [0ms] Starting types
2016-04-21 13:15:01,892 INFO [main] [ConsoleStatus] [22/32] [23883ms] [3629ms] Starting system-services
2016-04-21 13:15:01,892 INFO [main] [ConsoleStatus] [23/32] [23883ms] [0ms] Starting agent-server
2016-04-21 13:15:01,892 INFO [main] [ConsoleStatus] [24/32] [23883ms] [0ms] Starting allocator-server
2016-04-21 13:15:03,715 INFO [main] [ConsoleStatus] [25/32] [25706ms] [1823ms] Starting api-server
2016-04-21 13:15:05,352 INFO [main] [ConsoleStatus] [26/32] [27343ms] [1637ms] Starting iaas-api
2016-04-21 13:15:05,353 INFO [main] [ConsoleStatus] [27/32] [27344ms] [0ms] Starting archaius
2016-04-21 13:15:05,353 INFO [main] [ConsoleStatus] [28/32] [27344ms] [0ms] Starting core-model
2016-04-21 13:15:05,353 INFO [main] [ConsoleStatus] [29/32] [27344ms] [0ms] Starting core-object-defaults
2016-04-21 13:15:05,353 INFO [main] [ConsoleStatus] [30/32] [27344ms] [0ms] Starting encryption
2016-04-21 13:15:05,359 INFO [main] [ConsoleStatus] [31/32] [27350ms] [6ms] Starting process
2016-04-21 13:15:05,360 INFO [main] [ConsoleStatus] [32/32] [27351ms] [0ms] Starting redis
13:15:05.507 [main] INFO ConsoleStatus - [DONE ] [32634ms] Startup Succeeded, Listening on port 8081
time=“2016-04-21T13:15:05Z” level=info msg="Starting websocket proxy. Listening on [:8080], Proxying to cattle API at [localhost:8081], Monitoring parent pid [8]."
time=“2016-04-21T13:15:06Z” level=info msg="Downloading certificate from http://localhost:8081/v1/credentials/1c1/certificate"
time=“2016-04-21T13:15:07Z” level=info msg="Starting Rancher Catalog service"
time=“2016-04-21T13:15:07Z” level=info msg="Using catalog library=https://github.com/rancher/rancher-catalog.git"
time=“2016-04-21T13:15:07Z” level=info msg="Using catalog community=https://github.com/rancher/community-catalog.git"
time=“2016-04-21T13:15:07Z” level=info msg=“Starting rancher-compose-executor” version=v0.7.4
time=“2016-04-21T13:15:07Z” level=info msg=“Setting log level” logLevel=info
time=“2016-04-21T13:15:07Z” level=info msg=“Starting go-machine-service…” gitcommit=dc97268
time=“2016-04-21T13:15:07Z” level=info msg="Updating docker-machine-drivers from cattle."
time=“2016-04-21T13:15:08Z” level=info msg=“Initializing event router” workerCount=10
time=“2016-04-21T13:15:09Z” level=error msg="Driver: ubiquity is error ignoring driver download."
time=“2016-04-21T13:15:09Z” level=error msg="Driver: packet is error ignoring driver download."
time=“2016-04-21T13:15:09Z” level=info msg="packet not to be used removing any schemas it has."
time=“2016-04-21T13:15:09Z” level=info msg="ubiquity not to be used removing any schemas it has."
time=“2016-04-21T13:15:15Z” level=info msg="Connection established"
2016-04-21 13:15:37,411 ERROR [:] [] [] [] [cutorService-10] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [11] count [3]
time=“2016-04-21T13:15:38Z” level=info msg="Started setting and driver watcher."
2016-04-21 13:15:42,412 ERROR [:] [] [] [] [cutorService-10] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [11] count [4]
time=“2016-04-21T13:15:43Z” level=info msg=“Initializing event router” workerCount=10
time=“2016-04-21T13:15:43Z” level=info msg="Connection established"
time=“2016-04-21T13:15:46Z” level=info msg="Registering backend for host [8e3008b9-de70-4c5b-8770-04c35b341a3f]"
2016-04-21 13:15:47,413 ERROR [:] [] [] [] [ecutorService-6] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [11] count [5]

and sometimes:

2016-04-21 13:18:29,907 ERROR [bf89ce7f-c40c-4aa1-99e8-91585a8a2227:43033] [instance:583] [instance.purge] [] [ecutorService-2] [c.p.e.p.i.DefaultProcessInstanceImpl] Unknown exception io.cattle.platform.eventing.exception.EventExecutionException: Operation failed
at io.cattle.platform.eventing.exception.EventExecutionException.fromEvent(EventExecutionException.java:53) ~[cattle-framework-eventing-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.agent.impl.RemoteAgentImpl.callSync(RemoteAgentImpl.java:87) ~[cattle-iaas-agent-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.agent.impl.RemoteAgentImpl.callSync(RemoteAgentImpl.java:135) ~[cattle-iaas-agent-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.process.common.handler.AgentBasedProcessHandler.callSync(AgentBasedProcessHandler.java:180) ~[cattle-iaas-logic-common-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.process.common.handler.AgentBasedProcessHandler.handleEvent(AgentBasedProcessHandler.java:166) ~[cattle-iaas-logic-common-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.process.instance.PostInstancePurge.handle(PostInstancePurge.java:38) ~[cattle-iaas-logic-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runHandler(DefaultProcessInstanceImpl.java:446) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$4.execute(DefaultProcessInstanceImpl.java:393) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$4.execute(DefaultProcessInstanceImpl.java:387) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.idempotent.Idempotent.execute(Idempotent.java:42) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runHandlers(DefaultProcessInstanceImpl.java:387) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runLogic(DefaultProcessInstanceImpl.java:493) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runWithProcessLock(DefaultProcessInstanceImpl.java:320) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$2.doWithLockNoResult(DefaultProcessInstanceImpl.java:260) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.LockCallbackNoReturn.doWithLock(LockCallbackNoReturn.java:7) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.LockCallbackNoReturn.doWithLock(LockCallbackNoReturn.java:3) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl$3.doWithLock(AbstractLockManagerImpl.java:40) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.LockManagerImpl.doLock(LockManagerImpl.java:33) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:13) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:37) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.acquireLockAndRun(DefaultProcessInstanceImpl.java:257) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runDelegateLoop(DefaultProcessInstanceImpl.java:185) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.executeWithProcessInstanceLock(DefaultProcessInstanceImpl.java:158) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$1.doWithLock(DefaultProcessInstanceImpl.java:108) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$1.doWithLock(DefaultProcessInstanceImpl.java:105) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl$3.doWithLock(AbstractLockManagerImpl.java:40) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.LockManagerImpl.doLock(LockManagerImpl.java:33) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:13) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:37) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.execute(DefaultProcessInstanceImpl.java:105) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.eventing.impl.ProcessEventListenerImpl.processExecute(ProcessEventListenerImpl.java:74) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.eventing.impl.ProcessEventListenerImpl.processExecute(ProcessEventListenerImpl.java:56) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at sun.reflect.GeneratedMethodAccessor452.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_95]
at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_95]
at io.cattle.platform.eventing.annotation.MethodInvokingListener$1.doWithLockNoResult(MethodInvokingListener.java:76) [cattle-framework-eventing-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.LockCallbackNoReturn.doWithLock(LockCallbackNoReturn.java:7) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.LockCallbackNoReturn.doWithLock(LockCallbackNoReturn.java:3) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl$3.doWithLock(AbstractLockManagerImpl.java:40) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.LockManagerImpl.doLock(LockManagerImpl.java:33) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:13) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:37) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.eventing.annotation.MethodInvokingListener.onEvent(MethodInvokingListener.java:72) [cattle-framework-eventing-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.eventing.impl.AbstractThreadPoolingEventService$2.doRun(AbstractThreadPoolingEventService.java:135) [cattle-framework-eventing-0.5.0-SNAPSHOT.jar:na]
at org.apache.cloudstack.managed.context.NoExceptionRunnable.runInContext(NoExceptionRunnable.java:15) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]
at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:108) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]
at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_95]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_95]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_95]

Ok. Found solution: Hosts have no internet access. Think they would update components…