Is deploying rancher on external node a good or horrible idea?

We just recently discovered Rancher, and it is an absolutely awesome platform! Really great job to all the great people behind this project!

I would really like to deploy Rancher in our production environment and got a question of where does Rancher Master / Manager need to be deployed?

We manage multiple production clusters for different independent projects and for simplicity and convenience sake would really like to have a single point of login and managing all different environments and projects. Is it possible to setup management node to be remote from all the worker nodes?

I.e. we have a few projects that all have setup similar to this:
Project A - wordpress-based shopping site with:
2 web,
2 db
1 cache servers

Porject B - meteor based shopping aggregator:
1 web
1 db + cache

Project C - magento based shopping site:
2 web
2 db + redis cache instances

All 3 projects are on different dedicated servers in different hosting companies and all 3 are completely independent from each other.

Is it possible to have a management node setup (let’s say) on our office server and manage all clusters independently from each other via public internet. Or will it cause problems like public / private networking within each cluster, etc.?

Yes, it’s possible, and from what I understand, it’s kind of the promise behind Rancher. They even have options to automagically create hosts in various hosting “cloud” plaforms (Digital Ocean, etc).

I myself have a Rancher server in the US and have hosts across the pond in UK as well as some in the same data center as the Rancher server.

Yes; What you’ve referred to as a project is called an Environment in Rancher (but actually project in the API :smile:). Each environment has a set of hosts with their own private network between them. The hosts connect to the master (rancher/server) server for management but all the network communication is directly between the hosts in that environment.

does it self-discover network interface that it needs to use for internal communication?
ie each server has public NIC and local NIC for internal usage, such as local is used to access DB and public is used to serve requests via website, etc.?

if that’s the case, what am I doing wrong here. I just deployed sample wordpress app from Rancher’s catalog and response time is pretty “crappy” to say the list. Here’s an example of ping trace with timestamps from within wordpress container, with ping to MYSQL container that is installed as port of the same Catalog template:

Mon Mar 14 23:37:58 UTC 2016
PING mysql (10.42.165.161): 56 data bytes
64 bytes from 10.42.165.161: icmp_seq=0 ttl=64 time=0.106 ms
64 bytes from 10.42.165.161: icmp_seq=1 ttl=64 time=0.057 ms
64 bytes from 10.42.165.161: icmp_seq=2 ttl=64 time=0.082 ms
64 bytes from 10.42.165.161: icmp_seq=3 ttl=64 time=0.057 ms
64 bytes from 10.42.165.161: icmp_seq=4 ttl=64 time=0.054 ms
— mysql ping statistics —
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.054/0.071/0.106/0.000 ms
Mon Mar 14 23:38:26 UTC 2016

notice how DNS discovery took over 30 seconds for what is local on this node.

This install was done by following this:

  1. install clean centos7 on i7 32core 128gb ram ssd raid0 disks
  2. install latest docker
  3. created new environment in rancher
  4. installed rancher agent
  5. switched to environment
  6. install app from catalog -> Wordpress

=> it deployed 2 containers: 1) web, 2) db mariadb-mysql
=> it exposed port 80 for connecting to the website (which is great)
=> any web request is mega slow (30 second to a minute delay for it to open any page)

What did i miss here?

also just noticed that when i run docker inspect, not sure if it is supposed to be this way, or if something is off, but it’s showing:

        "Dns": [
            "169.254.169.250"
        ],

and inside the docker container:

more /etc/resolv.conf
===== Yields this: ==========
search aam-wp.rancher.internal wordpress.aam-wp.rancher.internal rancher.internal
nameserver 169.254.169.250

doesn’t see right, or does it? or do i need to install some other service on top first for managing DNS somehow?

Each host runs a Network Agent container that runs a DNS server which provides the answer when you try to resolve e.g. mysql. 169.254.169.250 is a link-local IP that the network agent listens on, so that it’s always the same on any host. Anything that doesn’t match a rancher.internal record is recursed to the DNS servers set on the container, or the ones set on the docker daemon/host.

Hosts use the public IP that is shown in the UI to setup the IPSec tunnels to communicate with each other. You need UDP ports 500 and 4500 open between all the hosts to set that up. But it’s apparently working since pings come back. DNS requests don’t cross the network at all.

Clearly ~23 seconds to resolve mysql. is bad, it should be on the order of a millisecond. If it’s reproducible can you do a packet capture? Something like tcpdump -w dns.pcap port 53

appears that none of domains are resolving correctly form neither of the containers as though DNS is simply not available. Just a sample output from terminal showing that they can’t ping any domain or even apt-get update fails the same way. So no luck with tcpdump since i can’t get it anywhere:

CONTAINER ** r-aam-wp_db_1

root@3e758af2b067:/# apt-get update
0% [Connecting to httpredir.debian.org] [Connecting to security.debian.org] [Connecting to ftp.osuosl.org]^C
root@3e758af2b067:/# ping google.com

COTNAINER ** r-aam-wp_wordpress_1

root@e99b5546b3f8:/var/www/html# tcpdump -w dns.pcap port 53
bash: tcpdump: command not found
root@e99b5546b3f8:/var/www/html# apt-get install -y tcpdump
Reading package lists… Done
Building dependency tree
Reading state information… Done
E: Unable to locate package tcpdump
root@e99b5546b3f8:/var/www/html# ping google.com
^C
root@e99b5546b3f8:/var/www/html# ping microsoft.com
ping: unknown host
root@e99b5546b3f8:/var/www/html# apt-get update
Err http://httpredir.debian.org jessie InRelease

Err http://httpredir.debian.org jessie-updates InRelease

W: Some index files failed to download. They have been ignored, or old ones used instead.

any suggestions on what i’m doing wrong and why DNS isn’t resolving within expected period of time?

@virtuman It seems like your Network Agent containers probably just can’t reach the configured recursive resolvers. They default to the ones in /etc/resolv.conf on the host.

In v0.63 there is not an answer for the FQDN mysql., so it would try to recursively resolve that before returning an error. Normally this would be fairly quick, public DNS would say “I dunno who mysql. is”, we’d send that back, and your client would then re-try with one of the other search paths, eventually getting to mysql.stack.rancher.internal. and we’d find an answer for that. But In your case recursion is apparently not working, so the recursive check in-between takes forever, eventually times out, and then continues.

Starting in v1.0.0-rc1 we attempt to do the search path resolution in the DNS server, so a request for mysql. should get a match without ever trying to recurse. You can give v1.0.0-rc1 a try now if you like.

But there is still likely something wrong with your setup causing recursion to not work, and this will probably not fix asking for something that is actually external.