DRBD IP address issue in SLES 11 SP2

Hello,

I have been working with SLES, DRBD, and XEN since SLES 10. The normal practice I use to configure DRBD is to use a separate network card (dedicated NIC on each of 2 servers). So, each server has 2 network cards, 1 for normal network visibility/functionality and the other just for DRBD connectivity (direct between the 2 servers).

This has worked fine right up to SLES 11 SP1 + HAE. However, in SLES 11 SP2 + HAE, DRBD partitions on each server will not see each other (get stuck at WFConnect state) with each server showing that the other is unknown when running cat /proc/drbd - this happens if I use an IP address other than the ‘live’ one I used when I built the server - the one with visibility to the network in general. I have swapped the 2 NIC’s functionality by making the dedicated one the live one and vice versa - and re-configured the IP addresses accordingly. What I found is that the IP addresses still gave the same outcome as before - only the original live one allowed DRBD to connect to the other node. It seems that the correct functionality follows the IP address and not the network cards in use.

I have the firewall completely disabled and find it very hard now to understand where the issue is. I have set up the server to run with the XEN kernel and DRBD is configured for this setup.

I never saw this issue with the previous versions of SLES - when needed to alter the IP address used by DRBD, simply change it in the DRBD.conf file - making sure that they are configured and ping is successful using them between the servers.

Has anybody got any insight on why this is happenening? The servers have been updated online.

John

RESOLVED - this issue is fixed. I have noted that there were network card issues with one of the nodes. I have 3 NIC’s in each node. One for DRBD replication, one for HA and the last for normal LAN connectivity. By removing all cards and enabling one at a time and re-installing SLES11 SP2 I have been able to identify a faulty NIC. It did appear to operate ok when I first set up the server - pings appeared fine etc. However, the faulty card was picked up by the OS when I re-installed it but the OS stated that it could not configure it. Using another NIC did not give this issue and when re-configuring DRBD, all connected and sync’d fine with the non-live directly connected nic’s.

It was unfortunate that the NIC was actually intermittently failing when I was performing the initial setups and causing this problem. I hope this may help others that mey see the same issue.

John

[QUOTE=6529034;4603]RESOLVED - this issue is fixed. I have noted that there were network card issues with one of the nodes. I have 3 NIC’s in each node. One for DRBD replication, one for HA and the last for normal LAN connectivity. By removing all cards and enabling one at a time and re-installing SLES11 SP2 I have been able to identify a faulty NIC. It did appear to operate ok when I first set up the server - pings appeared fine etc. However, the faulty card was picked up by the OS when I re-installed it but the OS stated that it could not configure it. Using another NIC did not give this issue and when re-configuring DRBD, all connected and sync’d fine with the non-live directly connected nic’s.

It was unfortunate that the NIC was actually intermittently failing when I was performing the initial setups and causing this problem. I hope this may help others that mey see the same issue.

John[/QUOTE]

Thanks for feeding it back John!

Interesting to see what happened and I’m sure it will help others along the way :slight_smile:

Cheers,
Willem