After upgrading from SLES11SP4, routing is broken.

Hi.

Here’s the original setup: OES2015SP1 Cluster (that it’s OES is not important, but it explains why it’s setup this way). All nodes have two interfaces, eth0 and eth1, where eth1 is set to start manually. In the cluster, there is one resource which starts eth1 via ifup. Both interfaces are in different IP networks. IPv4 routing is enabled.

IP setup:
eth0 10.1.11.1/16
eth1 100.2.10.1/24

/etc/sysconfic/network/routes:

100.2.10.0 10.1.255.248 255.255.255.0 eth0
0.0.0.0 10.1.255.248 0.0.0.0 eth0
default 100.2.10.248 - eth1

So, what this does is:

while eth1 is down, 100.2.10.0 is reached from eth0 via the gateway at 10.1.255.248, and of course 10.1.0.0 is local. Works fine. Also, the internet is reachable via the gateway (not important really)

Now if eth1 is brought up, the server becomes reachable at 100.2.10.1 too, both from the internet reachable via gateway at 100.2.10.248, but also from the local 10.1 network.

A minor weirdness of course is that traffic sent from 10.1.X.X to 100.2.10.1, will go to the router at 10.1.255.248, but the reply will come from the servers eth0 interface (but with the proper source IP), because the server itself has routing enabled. But nobody cares about this, it works.

Now, after the upgrade to SLES12SP3 (OES2018SP1, but again that plays no role), eth1 in fact becomes mostly useless. It is not reachable from either the internet, nor from the 10.1 network after it comes up. The only traffic it answers is local traffic from the same network 100.2.10.X. All other traffic (confirmed by tcpdump) receives no answer whatsoever, from neither interface. It reaches the server on the proper interface, and that’s that. The server simply doesn’t reply at all to any packet that reaches 100.2.10.1 on eth1 that doesn’t come from that same ip subnet, which is distinctively different how SLES11 behaves.

Output of route or ip route look identical between SLES11 and SLES12. Looking at /etc/sysconfig/network, the update has moved the routing entires from “routes” to ifroute-eth0 and ifroute-eth1 instead, while “routes” is now empty. Trying to remove those and put “routes” back in place makes no difference.

Firewall, of course, is totally disabled.

I understand SLES12SP3 now uses wicked vs. networkmanager, and must assume that’s the culprit somehow.

Any ideas how to attack this? Why does eth1/100.2.10.1 not respond at all to any packet coming from a (to it) remote address, despite it even having the default gateway attached to it?

Oh, one further interesting information: When I bring down eth0, eth1 immediately starts to behave as it should, and becomes reachable from everywhere.

CU,
Massimo

Ok, I dumbed this down for a test, and still fail. Have just a default gateway set on the eth1 network, no other routes. Still absolutely can’t reach the IP of eth1 from a client in the network connected to eth0, unless I bring eth0 down, then it starts working immediately. Obviously, my SLES12SP3(OES2018SP1) box totally refuses to reply to packets coming in to eth1 from an IP in the network eth0 is connected to. Neither via local routing (which is what sles11 does), nor via the gateway connecting the two networks.

My assumption is now some sysctl / security setting, but so far can’t find anything.

Ideas?

CU,
Massimo

Found it.

sysctl -w net.ipv4.conf.all.rp_filter=0

Thanks for listening. :wink:

CU,
Massimo

Oh, and I just need to say this:

It is not SUSE’s business to set such a parameter by default (you’re not a fiewall admin, and it’s not up to you to decide if and what I want to route), and especially not to change it in a version upgrade.

Just sayin’