Ping Local Host on Rancher Server Fails

Hi,

I am experiencing some weird behaviour. When I ssh into the rancher-server (hosted on aws) I cannot ping anything not even localhost. The terminal just blocks. Tried restarting the rancher-server machine but the behaviour is exactly the same.

Everything seems to be configured fine

/etc/hosts

127.0.0.1    localhost
::1    localhost ip6-localhost ip6-loopback
fe00::0    ip6-localnet
ff00::0    ip6-mcastprefix
ff02::1    ip6-allnodes
ff02::2    ip6-allrouters

ifconfig

docker-sys Link encap:Ethernet  HWaddr 00:00:00:00:00:00
          inet addr:172.18.42.1  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::ec2a:e2ff:fecc:f39/64 Scope:Link
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:4 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:304 (304.0 B)  TX bytes:578 (578.0 B)

docker0   Link encap:Ethernet  HWaddr 02:42:27:37:50:E5
          inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:27ff:fe37:50e5/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1228453 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1437855 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:690426426 (658.4 MiB)  TX bytes:1008415581 (961.6 MiB)

eth0      Link encap:Ethernet  HWaddr 02:02:2C:E4:6D:39
          inet addr:172.31.32.134  Bcast:172.31.32.255  Mask:255.255.255.0
          inet6 addr: fe80::2:2cff:fee4:6d39/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2207837 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1394985 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1656352673 (1.5 GiB)  TX bytes:723929168 (690.3 MiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:4 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:336 (336.0 B)  TX bytes:336 (336.0 B)

vetha4e49ff Link encap:Ethernet  HWaddr C2:C9:5A:82:F9:B9
          inet6 addr: fe80::c0c9:5aff:fe82:f9b9/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:710215 errors:0 dropped:0 overruns:0 frame:0
          TX packets:824405 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:439716817 (419.3 MiB)  TX bytes:535368015 (510.5 MiB)

Is this normal? Has anyone experienced this?

Thanks

Mark

Just want to double check that you have your security groups set up correctly if you are hosting on AWS?

Pretty sure they are. In fact it was working and everything got skewed after the upgrade. We have now installed rancher-server on ubuntu and it has been a much more stable experience. Have you had any similar experience? Would you recommend RancherOS for a production server at this point in time?

Oh, so something failed after upgrade. Were you upgrading Rancher or RancherOS?

We test primarily on Ubuntu as Rancher was created before RancherOS, so at this time Rancher on Ubuntu will definitely be more stable than with RancherOS.

RancherOS and Rancher are still in beta, so I’ll let you make the decision on what you would use for production. :slight_smile:

No everything went haywire after a service upgrade. We have a tomcat service running. This service is load balanced through the HA proxy you provide. Upon upgrading everything went haywire, we could not ping anything internally or across the network. We could not even ping local host within the same container. On Ubuntu we still experience problems when we use HA Proxy. Sometimes the service is up and running however the LB cannot ‘see’ the service. If we leave both running the system auto recovers after some time. Is this something you experienced. Could it be that the LB is using the Network Agents to see whether there are any services of ‘this’ kind running and that we are hitting the 4 second issue you discuss in your video?

The load balancer issue that you are hitting (especially if it’s after you upgrade a service sounds like these issues:

and ultimately, it sounds like it’s this:

We are hoping to have this fixed in an upcoming release.