Internal DSN issue - rancher-metadata not resolved; rancher-metadata.rancher.internal resolves fine

In working on Trying to run Eureka (Netflix) under Rancher - resolving issues, I was unable to reach the Rancher Metadata Service from within my container using hostname “rancher-metadata”…it is not able to be resolved. However, when I try “rancher-metadata.rancher.internal” I am able to hit Metadata Service just fine.

In short, all attempts to reach rancher services using their simple hostnames (e.g. discovery) fail; all attempts agains the “*.rancher.internal” succeed".

None of the docs mention this. What’s wrong?

My Environment:

What does /etc/resolv.conf contain in your containers, and what application/language/library is trying to resolve the name? (click Execute Shell on one of the containers and cat /etc/resolv.conf)

rancher.internal should be in the search list.

You can also try something like curl http://rancher-metadata/ while in the shell, but may have to install curl or similar first.

/etc/resolv.conf (in container)

search localdomain rancher.internal
# nameserver 192.168.101.2
nameserver 169.254.169.250

From within container, http://rancher-metadata returns bad search results from version (my ISP provider)

http://rancher-metadata.rancher.internal returns correct JSON data (2015-07-25 and latest).

Dan

Update:
Issue seems to be that within resolv.conf line “search localdomain rancer.internal”, localdomain comes first. When trying to resolve rancher-metadata, it looks to rancher-metadata.localdomain

So what normally happens is that your provider’s DNS would return NXDOMAIN since it has no record, and then the client would retry with the next suffix in the search list.

In this case your provider is “DNS hijacking” and returning the IP of their web server with ads and search results no matter what address you ask for, so the retry with the rancher suffix never happens.

This is basically evil and can be avoided by configuring your hosts to use something like OpenDNS or Google DNS and ignore their servers.

The reason we put rancher.internal last is because we don’t want to stomp over any other unqualified names that may be uses on your network.

That solved my issue.

Thanks and much appreciated!

The /etc/resolv.conf on my host is not the same as the one in my container.

This is the error I’m seeing:

curl -I ‘http://rancher-metadata

HTTP/1.1 307 Temporarily Moved for Domain Name Expansion

rancher.internal is not first, but it’s in the resolv.conf search path.

@ebishop What version of Rancher and what do your host and container resolv.confs look like? That message isn’t coming from us, so it’s likely that something similar is happening and you’re getting some random IP back from your ISP which then returns that redirect.

I had updated all of my servers to Docker 1.11. I found in the forumes, while looking at a Gluster problem, that there are a few Docker 1.11 incompatibilities. I fell back to Docker 1.10.3 and ths works now. So you can add this to the list of Docker 1.11 related problems.

I suspect that most problems I’ve had Racher in the past serveral weeks are due to upgraing to Docker 1.11. I’m sort of excited to retry my list of Rancher related issues. :slight_smile:

Is there are place in your documentation or your website where you clearly list the version of Docker Rancher is compabiable with? I’d like to know when it’s OK to upgrade Docker.

In the installation and adding hosts pages in the docs, it lists what Docker version is supported. It doesn’t explicitly call out Docker 1.11, but there is no + after Docker 1.10.3.

http://docs.rancher.com/rancher/latest/en/installing-rancher/installing-server/
http://docs.rancher.com/rancher/latest/en/rancher-ui/infrastructure/hosts/

It’s also in our requirements in our README in Github.

Normally we would try to not be behind and be forward-compatible in the first place, but there was more rewriting of how everything works than usual in 1.11 :slight_smile: