SBD fails to fence node if 1 of 2 sbd devices unreachable

beattspiuk · December 31, 2012, 4:48pm

Hi all

Inherited a 2 node cluster running SLES11 SP1 and HA extension. Servers are attached to 2xHP P2000 arrays. There is an SBD device served by each array, and these devices are visible through multipath. My problem is if I test a scenario of poweroff one server AND one of the disk arrays simultaneously, I can see that SBD attempts to fence the powered off node, but instead of acknowledging the fence as successful after the SBD msgwait timeout, it receives a return code of 1, and the cluster remains in an unclean state. There is also a fair amount of failed I/O being logged during this test. My initial investigations seem to suggest a multipath issue, in that I/O appears to be queueing on the SBD paths, even though I had already disabled the queue_if_no_path feature for the SBD LUNs.

Here are my SBD timeouts:-

Watchdog 120
Msgwait 240

Cluster Stonith Timeout 300

Multipath polling interval 5

I guess what I’m looking for initially is any tips or tricks to deal with SBD on multipath, as whatever config I add to multipath.conf regarding queue_if_no_path or no_path_retry seems to be ignored in favour of the controller settings.

Thanks all

Mike

system · January 10, 2013, 3:30pm

beattspiuk,

It appears that in the past few days you have not received a response to your
posting. That concerns us, and has triggered this automated reply.

Has your issue been resolved? If not, you might try one of the following options:

Visit http://www.suse.com/support and search the knowledgebase and/or check all
the other support options available.
You could also try posting your message again. Make sure it is posted in the
correct newsgroup. (http://forums.suse.com)

Be sure to read the forum FAQ about what to expect in the way of responses:
http://forums.suse.com/faq.php

If this is a reply to a duplicate posting, please ignore and accept our apologies
and rest assured we will issue a stern reprimand to our posting bot.

Good luck!

Your SUSE Forums Team
http://forums.suse.com

Topic		Replies	Views
SBD and multipath SLES High Availability Extension	3	283	January 28, 2013
SLES 11 SP3 - SBD Stonith - Resources do not migrate SLES High Availability Extension	2	500	December 16, 2016
Both nodes in OCFS2 cluster keep rebooting SLES High Availability Extension	2	426	June 15, 2015
Multipath and zFCP SLES for System Z	1	516	August 15, 2018
drbd/multipath errors after upgrading SLES and HA SP2 to SP3 SLES High Availability Extension	2	288	September 30, 2013

SBD fails to fence node if 1 of 2 sbd devices unreachable

Related topics