"transitioningMessage": "exit status 1"

I have a simple test that uses the Rancher API to create a host, wait for the machine and host API to return active, then delete it. It does this once per hour, and works about 9 out of 10 times. When it fails, the only real clues from the machines API is: “transitioning”: “error” and “transitioningMessage”: “exit status 1”. I’m running against OpenStack.

I know Rancher is likely just passing along this lame error response from OpenStack or from Machine, but I’m talking to a Rancher API…so that’s why I"m asking in this forum. :smile:

Is there a place to get more details on this situation so I can more accurately point the finger of blame?

btw, I"m able to clean up nicely and reset Rancher for the next test run automatically using the API, so good job again on building a robust product!

That is from docker-machine… The logs on the rancher/server container will have the command run and all the output it produced, there may be something more useful before this line.

Thanks Vincent.
Here is a bit more information that may or may not be useful to you. It didn’t tell me much, but since I looked, I thought I’d share it. I don’t expect you do to much with this btw. I know that Rancher is just using Docker Machine, who is trying to work with OpenStack.

Enjoy…

This is one chunk of ugliness that seems to happen first:

time="2015-10-31T02:16:37Z"
level=info msg=“Pulling rancher/agent:v0.8.2 image.”

time="2015-10-31T02:16:57Z"
level=info msg=“stderr: SSH cmd error!” resourceId=1ph313

time="2015-10-31T02:16:57Z"
level=info msg="stderr: command: if ! type docker; then curl -sSL https://get.docker.com | sh -; fi"
resourceId=1ph313

time="2015-10-31T02:16:57Z"
level=info msg="stderr: err : exit status 100"
resourceId=1ph313

time="2015-10-31T02:16:57Z"
level=info msg=“stderr: output : bash: line 0: type: docker: not
found” resourceId=1ph313

time="2015-10-31T02:16:57Z"
level=info msg="stderr: modprobe: FATAL: Module aufs not found."
resourceId=1ph313

========================================================

It seems Docker Machine tries to update the Ubuntu OS, and then this…

=======================================================

time="2015-10-31T02:16:57Z"
level=info msg=“stderr: Get:1 http://alln01-1-csi.clouds.archive.ubuntu.com/ubuntu/
trusty/main libnl-3-200 amd64 3.2.21-1 [44.5 kB]” resourceId=1ph313

time="2015-10-31T02:16:57Z"
level=info msg=“stderr: Get:2 http://alln01-1-csi.clouds.archive.ubuntu.com/ubuntu/
trusty/main libnl-genl-3-200 amd64 3.2.21-1 [10.2 kB]” resourceId=1ph313

time="2015-10-31T02:16:57Z"
level=info msg=“stderr: Get:3 http://alln01-1-csi.clouds.archive.ubuntu.com/ubuntu/
trusty-updates/main linux-image-3.13.0-66-generic amd64 3.13.0-66.108 [15.2
MB]” resourceId=1ph313

time="2015-10-31T02:16:57Z"
level=info msg="stderr: Get:4 http://alln01-1-csi.clouds.archive.ubuntu.com/ubuntu/
trusty/main wireless-regdb all 2013.02.13-1ubuntu1 [6,456 B]"
resourceId=1ph313

time="2015-10-31T02:16:57Z"
level=info msg=“stderr: Get:5 http://alln01-1-csi.clouds.archive.ubuntu.com/ubuntu/
trusty/main crda amd64 1.1.2-1ubuntu2 [15.2 kB]” resourceId=1ph313

time="2015-10-31T02:16:57Z"
level=info msg=“stderr: Get:6 http://alln01-1-csi.clouds.archive.ubuntu.com/ubuntu/
trusty/main iw amd64 3.4-1 [51.7 kB]” resourceId=1ph313

time="2015-10-31T02:16:57Z"
level=info msg="stderr: Get:7 http://alln01-1-csi.clouds.archive.ubuntu.com/ubuntu/
trusty-updates/main linux-firmware all 1.127.16 [24.3 MB]"
resourceId=1ph313

time="2015-10-31T02:16:57Z"
level=info msg=“stderr: Get:8 http://alln01-1-csi.clouds.archive.ubuntu.com/ubuntu/
trusty-updates/main linux-image-extra-3.13.0-44-generic amd64 3.13.0-44.73
[36.7 MB]” resourceId=1ph313

time="2015-10-31T02:16:57Z"
level=info msg=“stderr: Get:9 http://alln01-1-csi.clouds.archive.ubuntu.com/ubuntu/
trust” resourceId=1ph313

time="2015-10-31T02:16:57Z"
level=info msg=“starting cleanup…” machine name=pokey-on-CCS

time="2015-10-31T02:16:57Z"
level=info msg=“cleanup successful” machine name=pokey-on-CCS

time="2015-10-31T02:16:57Z"
level=error msg=“Error processing event” err="exit status 1"
eventId="ffad6976-1d74-43e2-8c55-18f6c5e30199"
eventName="physicalhost.create;handler=goMachineService"
resourceId=1ph313

=======================================================================
It appears that Rancher tries to clean up after the error and it throws an exception, so you
may want to check this in any case.

2015-10-29
23:57:47,989 ERROR [:] [] [] [] [cutorService-18]
[o.a.c.m.context.NoExceptionRunnable ] Uncaught exception java.lang.IllegalStateException:
java.io.IOException: Agent connection is closed

    at

io.cattle.platform.util.exception.ExceptionUtils.rethrowExpectedRuntime(ExceptionUtils.java:27)
~[cattle-framework-utils-0.5.0-SNAPSHOT.jar:na]

    at

io.cattle.platform.async.utils.AsyncUtils.get(AsyncUtils.java:42)
~[cattle-framework-async-0.5.0-SNAPSHOT.jar:na]

    at

io.cattle.platform.agent.server.service.impl.AgentServiceImpl$2.doRun(AgentServiceImpl.java:88)
~[cattle-iaas-agent-server-0.5.0-SNAPSHOT.jar:na]

    at

org.apache.cloudstack.managed.context.NoExceptionRunnable.runInContext(NoExceptionRunnable.java:13)
~[cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]

    at

org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
[cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]

    at

org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
[cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]

    at

org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:108)
[cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]

    at

org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
[cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]

    at

org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
[cattle-framework-managed-context-0.5.0-SNAPSHOT.jar:na]

    at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_79]

    at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_79]

    at

java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]

Caused by:
java.io.IOException: Agent connection is closed

    at

io.cattle.platform.agent.connection.event.AgentEventingConnection.execute(AgentEventingConnection.java:38)
~[cattle-agent-connection-event-0.5.0-SNAPSHOT.jar:na]

    at

io.cattle.platform.agent.server.service.impl.AgentServiceImpl.execute(AgentServiceImpl.java:77)
~[cattle-iaas-agent-server-0.5.0-SNAPSHOT.jar:na]

    at

io.cattle.platform.agent.server.eventing.impl.AgentEventListenerImpl.agentRequest(AgentEventListenerImpl.java:15)
~[cattle-iaas-agent-server-0.5.0-SNAPSHOT.jar:na]

    at

sun.reflect.GeneratedMethodAccessor429.invoke(Unknown Source) ~[na:na]

    at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[na:1.7.0_79]

    at

java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_79]

    at

io.cattle.platform.eventing.annotation.MethodInvokingListener$1.doWithLockNoResult(MethodInvokingListener.java:76)
~[cattle-framework-eventing-0.5.0-SNAPSHOT.jar:na]

    at

io.cattle.platform.lock.LockCallbackNoReturn.doWithLock(LockCallbackNoReturn.java:7)
~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]

    at

io.cattle.platform.lock.LockCallbackNoReturn.doWithLock(LockCallbackNoReturn.java:3)
~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]

    at

io.cattle.platform.lock.impl.AbstractLockManagerImpl$3.doWithLock(AbstractLockManagerImpl.java:40)
~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]

    at

io.cattle.platform.lock.impl.LockManagerImpl.doLock(LockManagerImpl.java:33)
~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]

    at

io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:13)
~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]

    at

io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:37)
~[cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]

    at

io.cattle.platform.eventing.annotation.MethodInvokingListener.onEvent(MethodInvokingListener.java:72)
~[cattle-framework-eventing-0.5.0-SNAPSHOT.jar:na]

    at

io.cattle.platform.eventing.impl.AbstractThreadPoolingEventService$2.doRun(AbstractThreadPoolingEventService.java:135)
~[cattle-framework-eventing-0.5.0-SNAPSHOT.jar:na]

    ... 9 common frames omitted

Hi @ebishop… i am also facing the same ‘exit status 1’ issue when i am trying to add a host through Rancher Api…If you have been able to resolve this issue can you throw some light on it…thanks… :slightly_smiling:

In my case, the “exit status 1” problem is intermittent. Also, after waiting a couple of minutes and the retry
would succeed. In fact, my Openstack tenant was recently upgraded to Icehouse and I have not seen the problem thus far. But I never found any clue as to the root of the problem. I’m also on v 56 of Rancher.

I think the other errors I reported here are related were likely not related to the original problem…and it was back in November, which is a few Rancher releases ago.

I am on v50 of Rancher and kilo version of openstack. may be this is because of the version issue. can you share your sample code with me to see what different (or wrong) i am doing in my file.thanks.

I can’t really share the code with you, but if you are able to create a VM host once, your code is fine. My code would work about 10 times in a row, then fail once, then work a again. And this was literally the exact same code kicked off by a Jenkins job every hour.

I suggest, if yours is an automated process, Try to delete the failed machine in Rancher to cleanup, sleep for a couple of minutes and retry. Rancher is really not doing much of the work to create the VM. The interaction between Docker Machine and Openstack is the more likely suspect area in my opinion.

@ebishop is correct in the fact that the “exit status 1” is just a result from docker-machine and openstack. Rancher is just calling the docker-machine command and that message comes back from docker-machine.

I have added an enhancement request to identify which are docker-machine reported errors.

@ebishop : I am trying to add host through python code. I am getting this error every time. i also checked docker container error logs but nothing useful is mentioned there. Feeling stucked here.
@denise : thanks for raising the enhancement request. I hope Rancher implements this request. This will be of great help for all developers.