Hi,
I have an issue with RancherOS 0.4.4 running in an Openstack environment.
The scenario is that I’m running a docker registry 2 (nexus3) which is running in another environment in rancher (not important).
So, rancherOS deployed on same compute node works fine having mtu 1500.
On another compute node Ive deployed both an ubuntu and a rancherOS. ubuntu shows mtu 1400 and rancheros shows 1500 on eth0. to get rancheros working with my registry I have to set mtu 1400 on eth0.
not sure if this is a openstack or rancheros issue.
The Ubuntu image has probably been pre-configured at that value.
The MTU value you need will very much depend on how the OpenStack networking has been configured (VLAN, GRE, VXLAN etc) so I can’t imagine it’s automatic in any distribution.
You can of course test by sending pinging between Rancher nodes on different compute nodes with 1400Byte size packets (ping -s 1400 x.x.x.x) and increasing the packet size until you hit a failure.
Either way, it’s a single, simple cloud-config parameter to change next time you build a host.
Yes, I’ve been doing testing with mtu’s and know the tunnels has a mtu impact, but since I’ve never had the issue earlier (rancherOS < 0.4.3) I was afraid the MTU settings from dhcp is dropped by RancherOS.
I will have a closer look at my OpenStack config to see if something has changed. (specially neutron/dhcp/mtu)
This command will display the lease information: dhcpcd -U ethX and might help you.
I’ll see if anything is different in v0.4.3 shortly…
So, the command and parameters used is the same between v0.4.3 and 0.4.4. The dhcpcd software goes from v6.9.3 to 6.10.1. Don’t think that’s what’s at fault.
Ok so ubuntu has no fixed mtu but honors network MTU. (I’ve bee playing around with dhcp-option-force in dnsmasq-neutron.conf) and when I change this value, the ubuntu instance will also.
But RancherOS still use 1500.
dhcp_dump is not available on RancherOS 0.4.4 so I cannot check.
Tried booting up an 0.4.2 and it correctly puts 1454 (as configured in neutron) on the eth0 interface.
The only difference between 0.4.4 and 0.4.3 dhcpd.conf is:
0.4.4:
option interface_mtu
Hmmm. Can you tell me if there is a 10-mtu file available in this directory on v0.4.2: /libexec/dhcpcd-hooks/10-mtu? I checked, there is.
There isn’t on v0.4.3 or 0.4.4 (in new location: /lib/dhcpcd/dhcpcd-hooks). This seems to relate to some changes made around MTU handling per route/destination. I suspect if you drop this file into /lib/dhcpcd/dhcpcd-hooks and comment out option interface_mtu on 0.4.3 or 0.4.4 things may work.
If the issue does end up being due to RancherOS, the fix should be pretty easy. It sounds like DHCP is overwriting the MTU value set in the cloud config. The argument --nohook mtu should be given to DHCP if the user sets an MTU value.
I think RancherOS should handle setting the argument in the background. That seems like a better experience than a user having to manually edit dhcpcd.conf. We do something similar for DNS and hostname settings.
Based on the commands you tried, it does seem to me like the problem is in RancherOS.
However, if i’m not mistaken, both 1.10 an 1.11 does not handle mtu on veth interfaces as in 1.9.1 and maybe earlier. This happens on all images I’ve tested. I need to feed --mtu 1454 to docker daemon or else traffic gets stuck (in particular ssl termination on rancher lb, it seems).
Not sure if docker plans to do something about it, but now my only solution is to feed mtu in cloud config.
I did some testing on this today and it seems that setting the MTU for an interface in the cloud config only fails if dhcp: true is also set. Once I find a way for dhcpcd to ignore MTU, this should be an easy fix.