RancherOS in AWS Custom VPC

Hi,

Complete newbie here so apologies if I’ve missed something obvious.

When I launch the RancherOS AMI in a AWS default VPC, everything works well. However when I launch the same AMI into a public subnet within a custom VPC, the Rancher instance is unable to pull down the docker image, infact, its unable to even run a wget on www.google.com

I’ve checked the following:
Non RancherOS instances in the same subnet can access internet ok.
No NACLs are blocking traffic.
IGW is attached.
ping to 8.8.8.8 works ok
nslookup for www.google.com fails

I’ve gone as far as rebuilding the VPC but same results persist.

Any ideas?
Thanks!
Steve

if you can ping external addresses, but not access via hostnames, then it sounds like you may (just) have a dns config issue.

see what your dns cfg is (cat /etc/resolve.conf)

perhaps https://docs.rancher.com/os/configuration/#setting-values will help you :slight_smile:

Hi Sven, thanks for the response.

resolv.conf is controlled by AWS, and looks the same as other nodes which are working. I can’t wget a site using an IP address either so it’s not just name resolution. I did also set the name servers using ros config but issue persists.

The fact I can ping external but nothing else is strange. Do you know if/how I can install netcat or tcpdump? This may help me troubleshoot.

Any other ideas highly appreciated!

Thank you
Steve

There’s no tcpdump or netcat, but telnet is available.

If very much looks like a firewall/ACL issue to me, or perhaps routing, although the fact you can ping 8.8.8.8 suggests not.

Have you compared the default gateway/route between a working and non-working host?

you can try set cloud-config in user_data to set the dns servers

e.g

#cloud-config
ssh_authorized_keys:
  - $key

rancher:
  network:
    dns:
      nameservers:
      - <server1>
      - <server2>

Note you might also be have to set the ssh_authorized_keys in the user_data cloud-config, i haven’t tested on aws, but on openstack that’s the case, setting cloud-config in the user_data will cause the public_key selected not to be injected by nova (cloud-init). In openstack you also have to set this config_drive option to be true for cloud-init or user_data to work. haven’t tried AWS case

In regard to wget not working, that would depend on how your VPC and network is configured.

  1. NetworkACL need to allow high port (1024 and all above) and port 80, 443
  2. SecurityGroup need to allow outgoing traffic
  3. If you are routing through nat instances, nat instance need to allow (1024 and all above) and port 80, 443 for both incoming and outgoing traffic

Hi, Thanks for your input!

Due to time constraints I decided against using DockerOS and just installed Rancher on an Ubuntu instance which is working well.

Thanks
Steve