Hi.
Here’s the original setup: OES2015SP1 Cluster (that it’s OES is not important, but it explains why it’s setup this way). All nodes have two interfaces, eth0 and eth1, where eth1 is set to start manually. In the cluster, there is one resource which starts eth1 via ifup. Both interfaces are in different IP networks. IPv4 routing is enabled.
IP setup:
eth0 10.1.11.1/16
eth1 100.2.10.1/24
/etc/sysconfic/network/routes:
100.2.10.0 10.1.255.248 255.255.255.0 eth0
0.0.0.0 10.1.255.248 0.0.0.0 eth0
default 100.2.10.248 - eth1
So, what this does is:
while eth1 is down, 100.2.10.0 is reached from eth0 via the gateway at 10.1.255.248, and of course 10.1.0.0 is local. Works fine. Also, the internet is reachable via the gateway (not important really)
Now if eth1 is brought up, the server becomes reachable at 100.2.10.1 too, both from the internet reachable via gateway at 100.2.10.248, but also from the local 10.1 network.
A minor weirdness of course is that traffic sent from 10.1.X.X to 100.2.10.1, will go to the router at 10.1.255.248, but the reply will come from the servers eth0 interface (but with the proper source IP), because the server itself has routing enabled. But nobody cares about this, it works.
Now, after the upgrade to SLES12SP3 (OES2018SP1, but again that plays no role), eth1 in fact becomes mostly useless. It is not reachable from either the internet, nor from the 10.1 network after it comes up. The only traffic it answers is local traffic from the same network 100.2.10.X. All other traffic (confirmed by tcpdump) receives no answer whatsoever, from neither interface. It reaches the server on the proper interface, and that’s that. The server simply doesn’t reply at all to any packet that reaches 100.2.10.1 on eth1 that doesn’t come from that same ip subnet, which is distinctively different how SLES11 behaves.
Output of route or ip route look identical between SLES11 and SLES12. Looking at /etc/sysconfig/network, the update has moved the routing entires from “routes” to ifroute-eth0 and ifroute-eth1 instead, while “routes” is now empty. Trying to remove those and put “routes” back in place makes no difference.
Firewall, of course, is totally disabled.
I understand SLES12SP3 now uses wicked vs. networkmanager, and must assume that’s the culprit somehow.
Any ideas how to attack this? Why does eth1/100.2.10.1 not respond at all to any packet coming from a (to it) remote address, despite it even having the default gateway attached to it?
Oh, one further interesting information: When I bring down eth0, eth1 immediately starts to behave as it should, and becomes reachable from everywhere.
CU,
Massimo