SUSE HAE - standby node disconnect from network

supphakorn · September 17, 2012, 5:54am

My environment has 2 nodes. I run all resources at node 1. When I disconnect all network at node 2 (standby), it active resource at node 2 (standby) and node1 still active resource. when I plug network node 2 back to network, cluster status show multiple running and restart all resource.

Do you have solution for this environment

Jens-U · September 17, 2012, 4:15pm

Hi supphakorn,

I’m not sure I fully understand what you’re trying to achieve: When you disconnect all networking at node 2, it’s not in standby, but offline.

Independent of standby/offline, it’s a matter of resource stickiness to avoid redistribution of resources after bringing up a new active cluster node.

OTOH, you say that you see resources running multiple times after reconnecting the second node… are we maybe talking about a split-brain situation here?

two nodes active, all resources active on node 1
disconnect node 2 from networking without putting node 2 into standby first
node 2 does no longer see node 1 (“split brain”) and so decides it must activate all resources on node 2. But on node 1, all these resources are active, too, at the same time.
once you reconnect node 2 to the network, you can see that resources are active on node 2, too…

This is a typical problem with two-node clusters and needs to be circumvented. There are many ways to skin that cat, the proper search terms are “two-node clusters” and “split brain”…

Regards,
Jens

LarsMB · September 25, 2012, 11:53pm

Hi there,

I think you have set no-quorum-policy=ignore (so that a two node cluster basically pretends to have quorum even with one node), but you have explicitly disabled IO fencing/STONITH. You need to re-enable and configure it properly for your environment to avoid the split brain and concurrency violation.

Best,
Lars

Topic		Replies	Views
2-Node Cluster: Resources restart when other node reboots SLES High Availability Extension	3	319	December 17, 2013
Can't take node out of standby after upgrade to SP2 SLES High Availability Extension	1	231	April 19, 2012
Both nodes in OCFS2 cluster keep rebooting SLES High Availability Extension	2	424	June 15, 2015
SLES 11 SP2 - 2 node cluster, unclean state / res. migration SLES High Availability Extension	1	300	July 25, 2012
Cluster resource stop on 3 node cluster SLES High Availability Extension	1	550	November 15, 2016

SUSE HAE - standby node disconnect from network

Related topics