Rancher upgrade 1.2 issues

I’ve upgraded rancher server to 1.2, then it started upgrading the environment

It had issues upgrading load balancers, with this errors https://gist.github.com/alex88/e071c459679c941d1557270b31e559d2

I’ve tried to create 4 new hosts with the new agent version, they became visible in the infrastructure view, so I’ve removed one host and containers moved to the new one (load balancer wasn’t still available in the new hosts, probably because it was in error state.

I then tried to delete another host, but now everything is stuck, frontend never loads (identities ajax request never returns a response), rancher server has just errors like

2016-12-01 10:15:34,686 ERROR [:] [] [] [] [tp1830712962-17] [.AgentQualifierAuthorizationProvider] Failed to determine the proper agent ID for subscription for account [731]

what can I do?

Upgrade always worked, except this one!

I’ve tried to restore the previous version, still the same issue with upgrading the load balancers

2016-12-01 15:52:21,314 ERROR [:] [] [] [] [ecutorService-3] [.e.s.i.ProcessInstanceDispatcherImpl] Expected state running but got error: Timeout getting IP address
2016-12-01 15:52:24,171 ERROR [:] [] [] [] [tp1830712962-17] [i.g.i.g.r.handler.ExceptionHandler  ] Exception in API for request [http://10.0.1.161:8080/v1/configcontent/metadata-answers]. Error id: [ff584964-5e1c-4b60-ab4a-664023a3e4dc]. org.yaml.snakeyaml.error.YAMLException: org.eclipse.jetty.io.EofException
	at org.yaml.snakeyaml.Yaml.dumpAll(Yaml.java:247) ~[snakeyaml-1.15.jar:na]
	at org.yaml.snakeyaml.Yaml.dump(Yaml.java:221) ~[snakeyaml-1.15.jar:na]
	at io.cattle.platform.configitem.context.impl.ServiceMetadataInfoFactory.writeMetadata(ServiceMetadataInfoFactory.java:100) ~[cattle-config-item-server-0.5.0-SNAPSHOT.jar:na]
	at io.cattle.platform.configitem.server.model.impl.MetadataConfigItem.handleRequest(MetadataConfigItem.java:42) ~[cattle-config-item-server-0.5.0-SNAPSHOT.jar:na]
	at io.cattle.platform.configitem.server.impl.ConfigItemServerImpl.handleDownload(ConfigItemServerImpl.java:97) ~[cattle-config-item-server-0.5.0-SNAPSHOT.jar:na]
	at io.cattle.platform.configitem.server.impl.ConfigItemServerImpl.handleRequest(ConfigItemServerImpl.java:40) ~[cattle-config-item-server-0.5.0-SNAPSHOT.jar:na]
	at io.cattle.platform.configitem.api.manager.ConfigContentManager.handle(ConfigContentManager.java:80) ~[cattle-config-item-api-0.5.0-SNAPSHOT.jar:na]
	at io.cattle.platform.configitem.api.manager.ConfigContentManager.getByIdInternal(ConfigContentManager.java:49) ~[cattle-config-item-api-0.5.0-SNAPSHOT.jar:na]
	at io.github.ibuildthecloud.gdapi.request.resource.impl.AbstractBaseResourceManager.getById(AbstractBaseResourceManager.java:61) ~[cattle-framework-java-server-0.5.0-SNAPSHOT.jar:na]
	at io.github.ibuildthecloud.gdapi.request.handler.ResourceManagerRequestHandler.generate(ResourceManagerRequestHandler.java:50) ~[cattle-framework-java-server-0.5.0-SNAPSHOT.jar:na]
	at io.github.ibuildthecloud.gdapi.request.handler.AbstractResponseGenerator.handle(AbstractResponseGenerator.java:14) ~[cattle-framework-java-server-0.5.0-SNAPSHOT.jar:na]
	at io.github.ibuildthecloud.gdapi.request.handler.write.DefaultReadWriteApiDelegate.handle(DefaultReadWriteApiDelegate.java:28) ~[cattle-framework-java-server-0.5.0-SNAPSHOT.jar:na]
	at io.github.ibuildthecloud.gdapi.request.handler.write.DefaultReadWriteApiDelegate.read(DefaultReadWriteApiDelegate.java:18) ~[cattle-framework-java-server-0.5.0-SNAPSHOT.jar:na]
	at sun.reflect.GeneratedMethodAccessor428.invoke(Unknown Source) ~[na:na]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_72]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_72]
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333) ~[spring-aop-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190) ~[spring-aop-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) ~[spring-aop-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:99) ~[spring-tx-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:280) ~[spring-tx-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:96) ~[spring-tx-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92) ~[spring-aop-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213) ~[spring-aop-4.3.2.RELEASE.jar:4.3.2.RELEASE]
	at com.sun.proxy.$Proxy55.read(Unknown Source) ~[na:na]
	at io.github.ibuildthecloud.gdapi.request.handler.write.ReadWriteApiHandler.handle(ReadWriteApiHandler.java:22) ~[cattle-framework-java-server-0.5.0-SNAPSHOT.jar:na]
	at io.github.ibuildthecloud.gdapi.servlet.ApiRequestFilterDelegate.doFilter(ApiRequestFilterDelegate.java:99) ~[cattle-framework-java-server-0.5.0-SNAPSHOT.jar:na]
	at io.cattle.platform.api.servlet.ApiRequestFilter$1.runInContext(ApiRequestFilter.java:95) [cattle-framework-api-0.5.0-SNAPSHOT.jar:na]
	at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:108) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]
	at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) [cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]
	at io.cattle.platform.api.servlet.ApiRequestFilter.doFilter(ApiRequestFilter.java:88) [cattle-framework-api-0.5.0-SNAPSHOT.jar:na]
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) [jetty-servlet-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) [jetty-servlets-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:364) [jetty-servlets-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) [jetty-servlet-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) [jetty-servlet-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) [jetty-server-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) [jetty-security-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) [jetty-server-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) [jetty-server-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) [jetty-servlet-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) [jetty-server-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) [jetty-server-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-server-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [jetty-server-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.server.Server.handle(Server.java:499) [jetty-server-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) [jetty-server-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) [jetty-server-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) [jetty-io-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) [jetty-util-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) [jetty-util-9.2.11.v20150529.jar:9.2.11.v20150529]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
Caused by: org.eclipse.jetty.io.EofException: null
	at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:192) ~[jetty-io-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:408) ~[jetty-io-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:302) ~[jetty-io-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:129) ~[jetty-io-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:684) ~[jetty-server-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:246) ~[jetty-util-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:208) ~[jetty-util-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:480) [jetty-server-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:768) [jetty-server-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:801) [jetty-server-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:147) ~[jetty-server-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:140) ~[jetty-server-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:378) ~[jetty-server-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.servlets.gzip.DeflatedOutputStream.deflate(DeflatedOutputStream.java:74) ~[jetty-servlets-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.servlets.gzip.DeflatedOutputStream.write(DeflatedOutputStream.java:64) ~[jetty-servlets-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.servlets.gzip.GzipOutputStream.write(GzipOutputStream.java:46) ~[jetty-servlets-9.2.11.v20150529.jar:9.2.11.v20150529]
	at org.eclipse.jetty.servlets.gzip.AbstractCompressedStream.write(AbstractCompressedStream.java:226) ~[jetty-servlets-9.2.11.v20150529.jar:9.2.11.v20150529]
	at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) ~[na:1.8.0_72]
	at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) ~[na:1.8.0_72]
	at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) ~[na:1.8.0_72]
	at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207) ~[na:1.8.0_72]
	at java.io.Writer.write(Writer.java:127) ~[na:1.8.0_72]
	at org.yaml.snakeyaml.emitter.Emitter.writeIndent(Emitter.java:1110) ~[snakeyaml-1.15.jar:na]
	at org.yaml.snakeyaml.emitter.Emitter$ExpectBlockMappingKey.expect(Emitter.java:624) ~[snakeyaml-1.15.jar:na]
	at org.yaml.snakeyaml.emitter.Emitter.emit(Emitter.java:216) ~[snakeyaml-1.15.jar:na]
	at org.yaml.snakeyaml.serializer.Serializer.serializeNode(Serializer.java:181) ~[snakeyaml-1.15.jar:na]
	at org.yaml.snakeyaml.serializer.Serializer.serializeNode(Serializer.java:205) ~[snakeyaml-1.15.jar:na]
	at org.yaml.snakeyaml.serializer.Serializer.serializeNode(Serializer.java:191) ~[snakeyaml-1.15.jar:na]
	at org.yaml.snakeyaml.serializer.Serializer.serializeNode(Serializer.java:206) ~[snakeyaml-1.15.jar:na]
	at org.yaml.snakeyaml.serializer.Serializer.serializeNode(Serializer.java:206) ~[snakeyaml-1.15.jar:na]
	at org.yaml.snakeyaml.serializer.Serializer.serializeNode(Serializer.java:206) ~[snakeyaml-1.15.jar:na]
	at org.yaml.snakeyaml.serializer.Serializer.serialize(Serializer.java:112) ~[snakeyaml-1.15.jar:na]
	at org.yaml.snakeyaml.Yaml.dumpAll(Yaml.java:243) ~[snakeyaml-1.15.jar:na]
	... 56 common frames omitted
Caused by: java.io.IOException: Broken pipe
	at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) ~[na:1.8.0_72]
	at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) ~[na:1.8.0_72]
	at sun.nio.ch.IOUtil.write(IOUtil.java:148) ~[na:1.8.0_72]
	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504) ~[na:1.8.0_72]
	at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:172) ~[jetty-io-9.2.11.v20150529.jar:9.2.11.v20150529]
	... 88 common frames omitted

Having big problems too. The load balancers seem to have failed to upgrade :frowning:

No idea what to do now, can’t get it working and presumably I can’t roll back even though I made a db backup?

2016-12-01 21:01:28,713 ERROR [:] [] [] [] [ecutorService-4] [.e.s.i.ProcessInstanceDispatcherImpl] Agent error for [storage.volume.remove.reply;agent=895]: Error response from daemon: Driver devicemapper failed to remove root filesystem 8794e1edcaa8e228cede2e4e531a4845b02e45157d1fdb6c3b4ba700c4410f41: Device is Busy
2016-12-01 21:01:58,108 ERROR [f983d257-f78f-494b-9717-553e10b3f783:412532] [instance:3910] [instance.start->(InstanceStart)] [] [ecutorService-5] [i.c.p.process.instance.InstanceStart] Failed to Waiting for dependencies for instance [3910]
2016-12-01 21:01:58,746 ERROR [eef71a03-3bc0-42f3-8610-dc8e6479a0cb:411282] [volumeStoragePoolMap:3657] [volumestoragepoolmap.remove] [] [ecutorService-8] [c.p.e.p.i.DefaultProcessInstanceImpl] Agent error for [storage.volume.remove.reply;agent=895]: Error response from daemon: Driver devicemapper failed to remove root filesystem 8794e1edcaa8e228cede2e4e531a4845b02e45157d1fdb6c3b4ba700c4410f41: Device is Busy
2016-12-01 21:01:58,772 ERROR [:] [] [] [] [ecutorService-8] [.e.s.i.ProcessInstanceDispatcherImpl] Agent error for [storage.volume.remove.reply;agent=895]: Error response from daemon: Driver devicemapper failed to remove root filesystem 8794e1edcaa8e228cede2e4e531a4845b02e45157d1fdb6c3b4ba700c4410f41: Device is Busy
2016-12-01 21:01:59,131 ERROR [:] [] [] [] [ecutorService-4] [o.a.c.m.context.NoExceptionRunnable ] Expected state running but got error: instance is not running : Dependencies readiness error
2016-12-01 21:02:47,702 ERROR [:] [] [] [] [p1830712962-125] [i.c.p.c.s.impl.ConfigItemServerImpl ] Client [agent:902] requesting non-assigned item [configscripts]
2016-12-01 21:02:50,143 ERROR [633ce571-24f5-4d91-bc43-b17e4baf105a:412578] [instance:3912] [instance.start->(InstanceStart)] [] [ecutorService-4] [i.c.p.process.instance.InstanceStart] Failed to Waiting for dependencies for instance [3912]
2016-12-01 21:03:25,416 ERROR [:] [] [] [] [TaskScheduler-2] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [889] count [3]
2016-12-01 21:03:30,418 ERROR [:] [] [] [] [TaskScheduler-2] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [889] count [4]
2016-12-01 21:03:35,419 ERROR [:] [] [] [] [TaskScheduler-2] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [889] count [5]
2016-12-01 21:03:40,421 ERROR [:] [] [] [] [TaskScheduler-2] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [889] count [6]
2016-12-01 21:03:40,423 ERROR [:] [] [] [] [TaskScheduler-2] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Scheduling reconnect for [889]
2016-12-01 21:03:45,517 ERROR [:] [] [] [] [TaskScheduler-2] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [889] count [7]
2016-12-01 21:03:50,520 ERROR [:] [] [] [] [TaskScheduler-1] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [889] count [8]
2016-12-01 21:03:54,191 ERROR [:] [] [] [] [ecutorService-4] [o.a.c.m.context.NoExceptionRunnable ] Expected state running but got error: instance is not running : Dependencies readiness error
2016-12-01 21:03:55,523 ERROR [:] [] [] [] [TaskScheduler-2] [i.c.p.a.s.ping.impl.PingMonitorImpl ] Failed to get ping from agent [889] count [9]
2016-12-01 21:03:58,800 ERROR [256f4295-cd7f-486f-ba01-e304d80d6b84:411283] [volumeStoragePoolMap:3657] [volumestoragepoolmap.remove] [] [ecutorService-9] [c.p.e.p.i.DefaultProcessInstanceImpl] Agent error for [storage.volume.remove.reply;agent=895]: Error response from daemon: Driver devicemapper failed to remove root filesystem 8794e1edcaa8e228cede2e4e531a4845b02e45157d1fdb6c3b4ba700c4410f41: Device is Busy
2016-12-01 21:03:58,806 ERROR [:] [] [] [] [ecutorService-9] [.e.s.i.ProcessInstanceDispatcherImpl] Agent error for [storage.volume.remove.reply;agent=895]: Error response from daemon: Driver devicemapper failed to remove root filesystem 8794e1edcaa8e228cede2e4e531a4845b02e45157d1fdb6c3b4ba700c4410f41: Device is Busy
2016-12-01 21:04:00,634 ERROR [0a00c06c-5428-48a9-80e8-4a876729810d:412493] [instance:3908->instanceHostMap:3115] [instance.start->(InstanceStart)->instancehostmap.activate] [] [cutorService-15] [c.p.e.p.i.DefaultProcessInstanceImpl] Agent error for [compute.instance.activate.reply;agent=889]: Timeout getting IP address
2016-12-01 21:04:00,635 ERROR [0a00c06c-5428-48a9-80e8-4a876729810d:412493] [instance:3908] [instance.start->(InstanceStart)] [] [cutorService-15] [i.c.p.process.instance.InstanceStart] Failed [1/2] to Starting for instance [3908]
2016-12-01 21:04:02,947 ERROR [:] [] [] [] [p1830712962-123] [i.c.p.c.s.impl.ConfigItemServerImpl ] Client [agent:902] requesting non-assigned item [configscripts]
time="2016-12-01T21:04:26Z" level=error msg="Git checkout failure from git err: exit status 128"
time="2016-12-01T21:04:26Z" level=error msg="Failed to pull the catalog from git repo https://gitlab-ci-token:******@gitlab.atomx.io/atomx/rancher-catalog.git, error: exit status 128"
2016-12-01 21:04:28,818 ERROR [ac2b8057-2b7d-4c3c-be4c-35d3ca68d8ae:411282] [volumeStoragePoolMap:3657] [volumestoragepoolmap.remove] [] [ecutorService-8] [c.p.e.p.i.DefaultProcessInstanceImpl] Agent error for [storage.volume.remove.reply;agent=895]: Error response from daemon: Driver devicemapper failed to remove root filesystem 8794e1edcaa8e228cede2e4e531a4845b02e45157d1fdb6c3b4ba700c4410f41: Device is Busy
2016-12-01 21:04:28,829 ERROR [:] [] [] [] [ecutorService-8] [.e.s.i.ProcessInstanceDispatcherImpl] Agent error for [storage.volume.remove.reply;agent=895]: Error response from daemon: Driver devicemapper failed to remove root filesystem 8794e1edcaa8e228cede2e4e531a4845b02e45157d1fdb6c3b4ba700c4410f41: Device is Busy
2016-12-01 21:04:32,592 ERROR [0a00c06c-5428-48a9-80e8-4a876729810d:412493] [instance:3908->instanceHostMap:3115] [instance.start->(InstanceStart)->instancehostmap.activate] [] [cutorService-15] [c.p.e.p.i.DefaultProcessInstanceImpl] Agent error for [compute.instance.activate.reply;agent=889]: Timeout getting IP address
2016-12-01 21:04:32,592 ERROR [0a00c06c-5428-48a9-80e8-4a876729810d:412493] [instance:3908] [instance.start->(InstanceStart)] [] [cutorService-15] [i.c.p.process.instance.InstanceStart] Failed [2/2] to Starting for instance [3908]

I ended up deleting and rebuilding all the load balancers last night. It works.

Yeah me too, it sucks since we had like 40 services linked

@alex88 on your second upgrade, have you seen the same error as for the frist time: https://gist.github.com/alex88/e071c459679c941d1557270b31e559d2? The exception in your second comment seems related to some other issue - metadata write/push, not the LB update:

2016-12-01 15:52:24,171 ERROR [:] [] [] [] [tp1830712962-17] [i.g.i.g.r.handler.ExceptionHandler  ] Exception in API for request [http://10.0.1.161:8080/v1/configcontent/metadata-answers]. Error id: [ff584964-5e1c-4b60-ab4a-664023a3e4dc]. org.yaml.snakeyaml.error.YAMLException: org.eclipse.jetty.io.EofException

It is most likely caused either by system being overloaded in general, or metadata container being stopped.

@Daniel_Skinner the exception you are seeing, is a completely diff one from the first 2 reported by @alex88. If you still have setup around, can you check if your infrastructure services are up (ipsec/networkServices/healthcheck) are up and running?

We are going to look at all the issues reported in this post.

@alex88 could you check the creation date for the services that failed to upgrade due to NPE in the gist? You can use this query for that:

select created from service where id=<>

About the null pointer exception, yes, it was occurring all the time continuously.

About the creation date, I’ve already deleted the load balancers, since I’ve never been able to make them work.

After creating them from scratch, everything is working fine. Is that query still relevant if the old load balancers has been purged?

Anyway, I’ve asked for a couple hours on IRC about this but unfortunately no one was available.

@alena it’s all up and running now but I do think some of the infrastructure services were failing too. After I’d got it all working by rebuilding the load balancers there were a few stopped containers for ipsec and healthcheck I think despite all those stacks being healthy by this point.

@alex88 removed entries should be left in the DB, so the query should be still relevant.

@Daniel_Skinner failing ipsec/health check can result in lb (and other services) being in a bad state as prereq for upgrade completion would be service.healthState=healthy. If health check service was having problems, it is quite possible that it impacted LB upgrade.

Note that this bit a number of users, but it also looks like there are a number of causes (boot2docker, etc). The defacto bug seems to be this one: https://github.com/rancher/rancher/issues/6858

@Daniel_Skinner , I also ran into errors like “Error response from daemon: Driver devicemapper failed to remove root filesystem”, and I worked around the error by rebooting the host. I’m using CentOS 7 with direct-lvm, which I thought solved this issue (Not the default LVM & Loop mode, which is not recommended for production).

@Stefan_Lasiewski thank you for the reference, system services not getting ip addresses can be very much likely caused by b2d.

@alex88 the bug that you hit - NPE on lb upgrade - may be caused by the fact that lbs-to-upgrade were created prior to inserviceupgrade feature being in. With that feature we’ve started versioning launch configs, and based on NPE it looks like the version is null for the launch config object. “Created” date would be the question to this answer.

@alena : Thanks. I don’t use boot2docker, and I’ve worked past our problems and Rancher seems to be working well now. My problems may have all been caused by this issue, which I’ll post for others:

Hi,
I had devicemapper issues on ubuntu cloud images aswell.
I had to migrate to aufs to get it stable.

/hw