CLVMD failed to start when bringing up single HA node

Dear Expert,

I am trying to build two node cluster. HA version is SLES 12 SP2 HAE. The problem is I’m able to build the cluster successfully but during HA test when both the node (node1 & node2) is down and I’m trying to bring up node1 “CLVMD” is always failed. Kindly help.

Error:

2017-07-12T06:16:30.076663-05:00 nfsnode1 dlm_controld[2343]: 107 fence work wait for quorum
2017-07-12T06:16:34.083064-05:00 nfsnode1 dlm_controld[2343]: 111 clvmd wait for quorum

Cluster configuration:

root@nfsnode1:/root>crm configure show
node 1084808202: nfsnode1
node 1084808203: nfsnode2
primitive admin_addr IPaddr2 \
params ip=192.168.220.13 \
op monitor interval=10 timeout=20
primitive clvm ocf:lvm2:clvmd \
params daemon_timeout=30 \
op start timeout=90s interval=0 \
op stop timeout=100s interval=0
primitive dlm ocf:pacemaker:controld \
op start timeout=90s interval=0 \
op stop timeout=100s interval=0
primitive fs_sap Filesystem \
params device="/dev/vgappdata/lvusrsap" directory="/usr/sap/XE1/" fstype=ext4 \
meta target-role=Started \
op start interval=0 timeout=60s \
op stop interval=0 timeout=60s \
op monitor interval=20s timeout=40s
primitive stonith-sbd stonith:external/sbd \
params pcmk_delay_max=30s
primitive vg1 LVM \
params volgrpname=vgappdata \
op start timeout=60s interval=0 \
op stop timeout=60s interval=0 \
op monitor interval=30s timeout=60s
primitive vip_sap IPaddr2 \
params ip=192.168.220.16 cidr_netmask=24 \
op start interval=0 timeout=20 \
op stop interval=0 timeout=20 \
op monitor interval=10 timeout=20
group g-clvm dlm clvm vg1
group g_sap fs_sap vip_sap \
meta target-role=Started
clone c-clvm g-clvm \
meta interleave=true ordered=true
order g-constraint Optional: c-clvm:start g_sap:start
property cib-bootstrap-options: \
have-watchdog=true \
dc-version=1.1.15-21.1-e174ec8 \
cluster-infrastructure=corosync \
cluster-name=hacluster \
stonith-enabled=true \
placement-strategy=balanced \
no-quorum-policy=ignore \
stonith-action=reboot \
stonith-timeout=150s
rsc_defaults rsc-options: \
resource-stickiness=1000 \
migration-threshold=5000
op_defaults op-options: \
timeout=600 \
record-pending=true
root@nfsnode1:/root>

Best regards,
Arunabha

Hi Arunabha,

could you please list the DLM config via “dlm_tool dump_config”? Just to verify its quorum setting.

Regards,
J

Here it is …

root@nfsnode1:/root>dlm_tool dump_config
daemon_debug=0
foreground=0
log_debug=0
timewarn=0
protocol=detect
debug_logfile=0
enable_fscontrol=0
enable_plock=1
plock_debug=0
plock_rate_limit=0
plock_ownership=0
drop_resources_time=10000
drop_resources_count=10
drop_resources_age=10000
post_join_delay=30
enable_fencing=1
enable_concurrent_fencing=0
enable_startup_fencing=0
enable_quorum_fencing=1
enable_quorum_lockspace=1
help=-1
version=-1

root@nfsnode1:/root>

I don’t have an 12SP2 HAE installation at hand - what catches the eye is

[QUOTE]enable_quorum_fencing=1
enable_quorum_lockspace=1[/QUOTE]

which I would have expected to be “0” in your two-node setup.

Unfortunately, “my hands are bound” (or rather, I’m sort of “blind”) until I am back at my office next week - if anyone else would like to chime in here, you’re welcome :wink:

Regards,
J