Error starting image from private Registry

I’m having an issue starting images from a private registry. I have the docker daemon set up and the hosts are able to login and pull/push to the registry manually. However starting the service in the Rancher UI does not seem to work. The container fails around the networking storage steps. I’ve managed to find this error in the agent logs on the hosts.

2016-02-03 15:14:16,699 INFO agent [140234196691216] [utils.py:430] worker44 : Starting request 6ef5ef64-097e-4a53-ac9d-b3aeedc3b8d3 for storage.image.activate
2016-02-03 15:14:16,825 INFO agent [140234196691216] [utils.py:430] Response: {“name”: “reply.7901292538545409532”, “transitioningProgress”: null, “resourceType”: “imageStoragePoolMap”, “resourceId”: “1ispm686”, “id”: “5695be24-0f41-4ae3-907b-abc381901393”, “transitioningMessage”: “Pulling repository docker.inrd.io/config-query-api”, “time”: 1454512456000, “previousNames”: [“storage.image.activate”], “transitioning”: “yes”, “data”: {}, “previousIds”: [“6ef5ef64-097e-4a53-ac9d-b3aeedc3b8d3”]} [0.00327587127686] seconds
2016-02-03 15:14:16,859 INFO agent [140234196691216] [utils.py:430] worker44 : Done request 6ef5ef64-097e-4a53-ac9d-b3aeedc3b8d3 for storage.image.activate [0.160033941269] seconds
2016-02-03 15:14:16,859 ERROR agent [140234196691216] [event.py:111] 92a44b07-3188-47f8-ae67-aef6ccf50beb : Unknown error
Traceback (most recent call last):
File “/var/lib/cattle/pyagent/cattle/agent/event.py”, line 94, in _worker_main
resp = agent.execute(req)
File “/var/lib/cattle/pyagent/cattle/agent/init.py”, line 15, in execute
return self._router.route(req)
File “/var/lib/cattle/pyagent/cattle/plugins/core/event_router.py”, line 13, in route
resp = handler.execute(req)
File “/var/lib/cattle/pyagent/cattle/agent/handler.py”, line 34, in execute
return method(req=req, **req.data.dict)
File “/var/lib/cattle/pyagent/cattle/storage/init.py”, line 27, in image_activate
action=lambda: self._do_image_activate(image, storage_pool,
File “/var/lib/cattle/pyagent/cattle/agent/handler.py”, line 77, in _do
raise Exception(“Operation failed”)
Exception: Operation failed
2016-02-03 15:14:16,867 INFO agent [140234196691216] [utils.py:430] Response: {“name”: “reply.7901292538545409532”, “resourceType”: “imageStoragePoolMap”, “resourceId”: “1ispm686”, “id”: “cfe56d14-394c-4e0c-a3da-667ffb1eacc7”, “transitioningInternalMessage”: “92a44b07-3188-47f8-ae67-aef6ccf50beb : Operation failed”, “time”: 1454512456000, “previousNames”: [“storage.image.activate”], “transitioning”: “error”, “data”: {}, “previousIds”: [“6ef5ef64-097e-4a53-ac9d-b3aeedc3b8d3”]} [0.00323700904846] seconds

Versions:
Rancher v0.44.0
Cattle v0.107.0
User Interface v0.65.0
Rancher Compose v0.5.1
Hosts are amazon ec2 instances running ubuntu 14.04

Have you added your docker login credentials in the ui? There is a section for “registries” if you click on your icon on the top right corner. I have noticed that rancher doesn’t seem to use the login credentials on the box, it does it some other way, so you might have to add credentials that way.

Yes I’ve added the credentials there. I can’t tell if rancher just can’t hit the registry or not.

Can you provide how you added your image name in the Rancher UI? Is it in this format?

[registry-name]/[namespace]/[imagename]:[version]

http://docs.rancher.com/rancher/configuration/registries/#using-registries

Typically, “Operation failed” is a sign that there is login credential issues.

Yes I have the image name set up in that way. After trying again I’ve found this in the logs for the agent as well.

I0203 18:31:57.829888 23453 container.go:369] Failed to update stats for container “/docker/136aaa2744a5be51dfc22fb0ed62d64373f7e1619b1c122a5fbed81acf151e4b”: du command failed on /var/lib/docker/aufs/diff/136aaa2744a5be51dfc22fb0ed62d64373f7e1619b1c122a5fbed81acf151e4b with output du: cannot access ‘/var/lib/docker/aufs/diff/136aaa2744a5be51dfc22fb0ed62d64373f7e1619b1c122a5fbed81acf151e4b/var/lib/cattle/download.BuSwdgQ’: No such file or directory

If you pre-pull the image onto your host, are you able to start the service or same error? I want to pinpoint if it’s a pull issue or something with launching the private image.

Pre-pulled images seem to work just fine.

Could it be this? https://github.com/rancher/rancher/issues/1920

I’ve set up a new instance of Rancher. I’m now working with the following versions.
Rancher v0.56.1
Cattle v0.141.1
User Interface v0.86.0
Rancher Compose v0.7.1

This gives me a better look a the error I seem to be getting.

500 Server Error: Internal Server Error (“Get https://docker.inrd.io/v2/innkeeper/nugetserver/manifests/latest: no basic auth credentials”)

Trying this is my browser correctly gets access to the file.

As for this suggestion, are the setting changes to be performed on each host, or the docker-server machine? Running the localroute container on a host gives me the error " Can’t access CATTLE_PREROUTING iptables chain." when using the compose in the comment.

Additionally my private registry and rancher server are not publicly available.

My private registry was set up in this fashion on an amazon ec2 instance that is only accessible by other machines on the VPC (which includes the rancher server and the hosts) https://www.digitalocean.com/community/tutorials/how-to-set-up-a-private-docker-registry-on-ubuntu-14-04

Hello,

I’m currently working with the original poster of this topic on this issue, and we noticed something.

To give context, we’re trying to debug this 500 Server error:

We searched where this is coming from and found out that its being emitted by the rancher/agent container that’s running on the host. Tracing the error, it led us to the pull() method in image.py inside the rancher/agent image. The pull() method tries to authenticate against our private registry by calling auth.load_config(). Now auth.load_config() looks for 2 configuration files: first is ~/.docker/config.json and if that doesn’t exist, ~/.dockercfg. Both files doesn’t exist in the rancher/agent image, and that causes the authentication to fail.

To test if this could be the culprit, I got a terminal from the running rancher/agent container and ran a “docker login” onto our private registry (via command line) to create one of the configuration files. It created the file ~/.dockercfg. Then, I spinned up a new service from Rancher, and voila, it works!

Do you know why this is happening? Are we missing something here? Is there a setting that we have incorrectly assigned? Hope you can help us.

Here’s our current setup:
Rancher v0.56.1
Cattle v0.141.1
User Interface v0.86.0
Rancher Compose v0.7.1
Rancher agent-instance (rancher/agent-instance) v0.6.0
Rancher agent (rancher/agent) v0.8.2

Hey Denise my colleague worked with me on the issue and posted his findings here. For some reason his post was flagged.

Un-flagged… (“A new user created a topic, and another new user at the same IP address replied. See the flag_sockpuppets site setting.”)

Just tried on one of my rancher-agents: There is no docker-config either, so maybe that’s not be the problem you’re looking for.
If you want me to check anything else, just say so - I do have a pretty similar setup running (aws+vpc+private registry) and it works fine for me.

PS: It would actually be great if rancher told you if the registry-credentials you provide are correct when you provide them, though that’s a different issue.

(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)

Thanks for replying. My colleague is having issues posting on this so I’ll be relaying messages. I’m not sure if this would matter, but I’d ask the question anyway: what kind of authentication you used on your private registry?

I’m using docker_auth which provides token authorization.

Ah we had set up our registry with basic auth through htpasswd

@chris.taormina did you manged to setup a private registry with basic auth in rancher?

Hey sorry this is super late. In the interest of time back then I actually just started using docker hub instead.