@ebishop just to clarify - you’ve re-deployed route53 service using v0.1.7 route53 image from catalog? If so, how soon the container has died? Could you attach the container logs to this post? Thanks!
ok, making some progress. I am able to connect to AWS and list the target zone. But I had to set some proxy environment variables inside my container.
I can’t set these in the UI, so I’m thinking I’ll copy the docker-compose and rancher-compose files out of the UI preview, and add my proxy environment variables.
I set http_proxy and https_proxy nothing special, and there is no ‘<’ character in the proxy URL.
what is the “self” section? I tried to create a route53 instance, then copied the docker-compose.yml and rancher-compose.yml from the UI after it was created. Then added the env vars to the list already in the docker-compose.yml
the stack is created and I tried starting it from the UI and by using rancher-compose.
@alena I’m guessing it’s getting back a html document of some sort (starting with “<”) and failing trying to parse that as json, or the metadata contains characters that aren’t escaped correctly and the resulting json is invalid.
Now that I can get to route53.amazonaws.com, when I try this URL in my boto3 script, it returns an error:
botocore.exceptions.ClientError: An error occurred (InvalidInput) when calling the ListHostedZones operation: maxitems must be a positive integer.
is it possible that AWS is choking on maxItems=9223372036854775807 ? It does seems like an invalid number for the max number of pages of hosted zones.
@ebishop its fine that you’ve started your service by manually copying docker-compose and rancher-compose.yml; that alone shouldn’t have caused any issues
The error below:
The error I got originally was that I couldn’t find the host in this url.
doesn’t seem to be relate to error you are observing when run AWS api call using boto3:
is it possible that AWS is choking on maxItems=9223372036854775807 ? It does seems like an invalid number for the max number of pages of hosted zones.
Self section contains client information - service/container/host information. In metadata config file, each client ip address is assigned with “self” section, so metadata server knows to which client serve which “self”. It looks something like:
I set http_proxy and https_proxy nothing special, and there is no ‘<’ character in the proxy URL.
These environment variables set could perhaps result in client ip address being re-set for the route53 container, so its no longer recognizable by metadata server. It results in empty response being served by metadata service to route53 container.
Its essential that route53 client ip address is of 10.42.x Rancher managed network for it to use metadata. And accessing metadata is required for Route53 programming.
We have to set http_proxy and https_proxy to resolve FQDNs outside of our lab. I don’t know why setting these would change Rancher internal routing, but if you’ll tell me how to check I’ll give it a shot. But the container never starts up so it may be a challenge.
@ebishop sure. Lets do this test. Login to network-agent container either using UI or docker (docker exec -ti [uuid of agent-instance container] bash, and tail /var/log/rancher-metadata log file. Then start your route53 server, and check the “tail” output. You should see the route53-to-metadata request being logged, something like:
@ebishop most likely happens because route53 container fails to send http request to rancher-metadata server, so it never reaches metadata, and never gets logged. Can be related to rancher managed network not being set properly on your setup (Possible problems with iptables-restore)
I know this is likely not related to the startup problem I’m having, but here is the exception we saw in the rancher server logs:
2015-12-02 13:01:34,935 ERROR [172da064-490c-4bc2-aa49-4fb0382e01df:32906] [instance:232] [instance.purge] [] [cutorService-19] [c.p.e.p.i.DefaultProcessInstanceImpl] Unknown exception io.cattle.platform.eventing.exception.EventExecutionException: 7eb0d25f-297b-4885-a148-55e652b31168 : 500 Server Error: Internal Server Error (“Cannot destroy container 499e63e3b0f3c3b9f413537786c108fad3629843d0cad063d07e15c927a22980: Failed to set container state to RemovalInProgress: Status is already RemovalInProgress”)
at io.cattle.platform.eventing.exception.EventExecutionException.fromEvent(EventExecutionException.java:53) ~[cattle-framework-eventing-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.agent.impl.RemoteAgentImpl.callSync(RemoteAgentImpl.java:72) ~[cattle-iaas-agent-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.agent.impl.RemoteAgentImpl.callSync(RemoteAgentImpl.java:158) ~[cattle-iaas-agent-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.process.common.handler.AgentBasedProcessHandler.callSync(AgentBasedProcessHandler.java:170) ~[cattle-iaas-logic-common-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.process.common.handler.AgentBasedProcessHandler.handleEvent(AgentBasedProcessHandler.java:156) ~[cattle-iaas-logic-common-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.process.instance.PostInstancePurge.handle(PostInstancePurge.java:38) ~[cattle-iaas-logic-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runHandler(DefaultProcessInstanceImpl.java:456) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$4.execute(DefaultProcessInstanceImpl.java:396) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$4.execute(DefaultProcessInstanceImpl.java:390) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.idempotent.Idempotent.execute(Idempotent.java:42) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runHandlers(DefaultProcessInstanceImpl.java:390) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runLogic(DefaultProcessInstanceImpl.java:520) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.runWithProcessLock(DefaultProcessInstanceImpl.java:323) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl$2.doWithLockNoResult(DefaultProcessInstanceImpl.java:259) ~[cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.LockCallbackNoReturn.doWithLock(LockCallbackNoReturn.java:7) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.LockCallbackNoReturn.doWithLock(LockCallbackNoReturn.java:3) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl$3.doWithLock(AbstractLockManagerImpl.java:40) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.LockManagerImpl.doLock(LockManagerImpl.java:33) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:13) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.lock.impl.AbstractLockManagerImpl.lock(AbstractLockManagerImpl.java:37) [cattle-framework-lock-0.5.0-SNAPSHOT.jar:na]
at io.cattle.platform.engine.process.impl.DefaultProcessInstanceImpl.acquireLockAndRun(DefaultProcessInstanceImpl.java:256) [cattle-framework-engine-0.5.0-SNAPSHOT.jar:na]
I decided to create a new host with Rancher, and then not even log in to it…just to be sure I didn’t mess something up.
I created a new route53 service, and also added a simple ubuntu container. The Ubunutu container came up just fine. So I added a logspout catalog stack and pointed it at my ubunut container. (just to see if it would come up). My Route53 container was still struggling, but I was surprised to see some slightly different info in the logs:
It was clearly able to connect to route53 and get a list of hosted zones, and found my target zone. (I changed it to X’s, but it was the correct ID). There are some Healthcheck failures also.
2/2/2015
12:27:34 PMtime=“2015-12-02T18:27:34Z” level=error
msg=“Healtcheck failed: unable to reach a provider”