SBD fails to fence node if 1 of 2 sbd devices unreachable

Hi all

Inherited a 2 node cluster running SLES11 SP1 and HA extension. Servers are attached to 2xHP P2000 arrays. There is an SBD device served by each array, and these devices are visible through multipath. My problem is if I test a scenario of poweroff one server AND one of the disk arrays simultaneously, I can see that SBD attempts to fence the powered off node, but instead of acknowledging the fence as successful after the SBD msgwait timeout, it receives a return code of 1, and the cluster remains in an unclean state. There is also a fair amount of failed I/O being logged during this test. My initial investigations seem to suggest a multipath issue, in that I/O appears to be queueing on the SBD paths, even though I had already disabled the queue_if_no_path feature for the SBD LUNs.

Here are my SBD timeouts:-

Watchdog 120
Msgwait 240

Cluster Stonith Timeout 300

Multipath polling interval 5

I guess what I’m looking for initially is any tips or tricks to deal with SBD on multipath, as whatever config I add to multipath.conf regarding queue_if_no_path or no_path_retry seems to be ignored in favour of the controller settings.

Thanks all

Mike

beattspiuk,

It appears that in the past few days you have not received a response to your
posting. That concerns us, and has triggered this automated reply.

Has your issue been resolved? If not, you might try one of the following options:

Be sure to read the forum FAQ about what to expect in the way of responses:
http://forums.suse.com/faq.php

If this is a reply to a duplicate posting, please ignore and accept our apologies
and rest assured we will issue a stern reprimand to our posting bot.

Good luck!

Your SUSE Forums Team
http://forums.suse.com