Rancher Cross AWS VPC Management

Hello, currently evaluating rancher for a production deployment and struggling to find a combination of VPC setup/security groups that make me comfortable. Looking for advice here on getting the ports locked down like I want.

My rancher master is currently setup in a Scenario 2 VPC (http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Scenario2.html) with the ALB in the public subnet and the EC2 instance(s) in the private subnet. This master then needs to manage Rancher Host clusters in different VPCs (one cluster per vpc).

This is simple to accomplish by using the rancher-machine security group and placing my Rancher Host instances in the public subnets of the various VPCs. Unfortunately this opens 22, 2376, 4500 and 500 up to the world which I am uncomfortable with.

  • Are people running Rancher clusters in prod with all of these ports exposed?

Regardless I have created a security group that is close to providing the security I want while allowing all Rancher functionality to continue working. However, IPSec remains broken. The security group is allowing the local VPC subnet (something like 10.x.x.x/20), but the cluster hosts are addressing each other using their public ips e.g. 54.x.x.x.

  • Is there a way to get rancher hosts to refer to each other by private IP? while the rancher server to refers to them by public?
  • Is there a way to get rancher hosts AND server to use a DNS entry to find hosts instead of an IP?

If the answer to either question above is yes. That would fix my issue. And finally…

  • Does anyone have standard patterns for setting up a rancher master that manages clusters in various VPCs? (clusters are not cross VPC but the managing server will most likely be in a different VPC than the cluster)
  • 22 is needed by docker-machine only during setup, from the server container, to SSH to the host and install docker/start the rancher agent.
  • 2376 is not needed at all, but docker-machine insists on adding it
  • 500 and 4500/udp are only needed between hosts in the environment.

So none of those actually have to be open to the internet.

Hosts communicate with each other using the public IP they are registered with (which shows up in the UI on each host). IF all the hosts in a single environment can reach each other with their private IPs then you can register them with those instead (https://docs.rancher.com/rancher/v1.5/en/faqs/agents/#how-does-the-host-determine-ip-address-and-how-can-i-change-it-what-do-i-do-if-the-ip-of-my-host-has-changed-due-to-reboot).

The agent opens the connection to the server for management. The server does not need to be able to directly reach the registered IP of the host.

You can (and should) set the host registration URL in settings to a hostname.

Thanks for the info!

So it seems like I need to add custom hosts instead of using the built in provisioning tools? And since I’ll be using custom hosts then 22 won’t need to be open to the master server?

I can work with that. Quick question. How stable is the below command. I’ve F5’ed a few times and it hasn’t changed.

sudo docker run -d --privileged -v /var/run/docker.sock:/var/run/docker.sock -v /var/lib/rancher:/var/lib/rancher rancher/agent:v1.2.0 https://<MY_RANCHER_HOST>/v1/scripts/<some hex value here>:<number>:<stuff here. maybe a token>

that number looks suspiciously like a unix time so I’m guessing I can’t save this command in a provisioning script? it would need to be input?

EDIT:

Slipping in one more question. Can I use the normal EC2 provisioning process and then adjust the Environment of the rancher-agent after start? If I could do that I could:

  1. Standard EC2 provisioning to public subnet with restrictive Security Group
  2. Run onetime script to change CATTLE_AGENT_IP of the rancher agent to use the private IP

Maybe there’s a hook in the provisioning process I could use to override this value?

The registration URL is good “forever” (with the default config), but for that specific environment. If you have a limited ferry of environments you’d typically create an auto scaling group which runs the registration command from cloud config and picks the right IP at the same time.

Ah, Thank You. That will work perfectly.

  1. Terraform can provision an ASG that leverages userdata to auto register EC2 instances into the environment.
  2. Use CATTLE_AGENT_IP to register the private IPs allowing the use of restrictive security groups
  3. EC2 instances can go into private subnets with security groups only allowing other members of the VPC to access 500/4500
  4. ELBs in public subnets provisioned through terraform and configured by http://rancher.com/inside-the-external-elb-catalog-template/ to give access to my applications.

Thanks man! You really put the last pieces into place.

Hey Vincent,

Having issues getting the CATTLE_AGENT_IP to work correctly. If you have another moment to help I would appreciate it.

I can use the rancher master to see how the container was started (Click container -> Choose Command Tab at the bottom):

https://<RANCHER_MASTER>/v1/scripts/<STUFF>,-e,CATTLE_AGENT_IP=10.150.128.811

However, the host is added to the rancher master with a public ip. If I check the logs for the container I see this line:

`3/29/2017 9:19:29 PMINFO: ENV: DETECTED_CATTLE_AGENT_IP=34.xx.xx.xx

I feel like I’m missing something silly. Anything obvious I can try?

EDIT:

Figured it out. I had the params in the wrong order.