stonith resource in down in a cluster

Hi,

in a DELL cluster, fews days ago, the local support has update machine firmware/bios on one only node. So, now we have got a strange error on SuSE cluster: Operation start failed for resource srvhanaprg2_stonith on node srvhanaprg1: call-id=57, rc-code=generic error (1), exit-reason=none. The resource is a “stonith:fence_drac5” type.

Stack: corosync
Current DC: srvhanaprg1 (version 1.1.16-6.23.1-77ea74d) - partition with quorum
Last change: Mon Jun 21 09:54:36 2021 by root via crm_attribute on srvhanaprg1

2 nodes configured
8 resources configured

Online: [ srvhanaprg1 srvhanaprg2 ]

Active resources:

Master/Slave Set: msl_SAPHana_PRG_HDB00 [rsc_SAPHana_PRG_HDB00]
Masters: [ srvhanaprg1 ]
Slaves: [ srvhanaprg2 ]
stonith-sbd (stonith:external/sbd): Started srvhanaprg1
srvhanaprg1_stonith (stonith:fence_drac5): Started srvhanaprg2
rsc_ip_PRG_HDB00 (ocf::heartbeat:IPaddr2): Started srvhanaprg1
Clone Set: cln_SAPHanaTopology_PRG_HDB00 [rsc_SAPHanaTopology_PRG_HDB00]
Started: [ srvhanaprg1 srvhanaprg2 ]

**Failed Actions:

  • srvhanaprg2_stonith_start_0 on srvhanaprg1 ‘unknown error’ (1): call=57, status=Error, exitreason=’’,
    last-rc-change=‘Sat Feb 20 17:30:00 2021’, queued=0ms, exec=18187ms**

System info srvhanaprg1:
Corosync Cluster Engine, version ‘2.3.6’
Copyright (c) 2006-2009 Red Hat, Inc.
crmsh: 3.0.4+git.1604284154.deeba556
resource-agents: # Build version: $Format:%H$
CRM Version: 1.1.16-6.23.1 (77ea74d)
/usr/share/crmsh/hb_report/hb_report: line 764: booth: command not found
pacemaker 1.1.16-6.23.1 - SUSE Linux Enterprise 12 x86_64
libpacemaker3 1.1.16-6.23.1 - SUSE Linux Enterprise 12 x86_64
corosync 2.3.6-9.13.1 - SUSE Linux Enterprise 12 x86_64
libcorosync4 2.3.6-9.13.1 - SUSE Linux Enterprise 12 x86_64
resource-agents 4.0.1+git.1495055229.643177f1-2.53.1 - SUSE Linux Enterprise 12 x86_64
cluster-glue 1.0.12+v1.git.1485976882.03d61cd1-3.11.1 - SUSE Linux Enterprise 12 x86_64
libglue2 1.0.12+v1.git.1485976882.03d61cd1-3.11.1 - SUSE Linux Enterprise 12 x86_64
ldirectord 4.0.1+git.1495055229.643177f1-2.53.1 - SUSE Linux Enterprise 12 x86_64
libqb0 1.0.3+20171226.6d62b64-4.3.1 - SUSE Linux Enterprise 12 x86_64
ocfs2-tools 1.8.5-3.6.1 - SUSE Linux Enterprise 12 x86_64
ocfs2-kmp-default 4.4.180-94.103.1 - SUSE Linux Enterprise 12 x86_64
ocfs2-kmp-default 4.4.180-94.135.1 - SUSE Linux Enterprise 12 x86_64
drbd 9.0.14+git.62f906cf-3.25.1 - SUSE Linux Enterprise 12 x86_64
drbd-kmp-default 9.0.14+git.62f906cf_k4.4.180_94.130-3.25.1 - SUSE Linux Enterprise 12 x86_64
drbd-utils 9.4.0-2.17.1 - SUSE Linux Enterprise 12 x86_64
lvm2 2.02.120-78.16.1 - SUSE Linux Enterprise 12 x86_64
lvm2-clvm 2.02.120-78.16.1 - SUSE Linux Enterprise 12 x86_64
libdlm 4.0.7-3.3.2 - SUSE Linux Enterprise 12 x86_64
libdlm3 4.0.7-3.3.2 - SUSE Linux Enterprise 12 x86_64
ruby 2.1-1.6 - SUSE Linux Enterprise 12 x86_64
kernel-default 4.4.180-94.103.1 - SUSE Linux Enterprise 12 x86_64
kernel-default 4.4.180-94.135.1 - SUSE Linux Enterprise 12 x86_64
glibc 2.22-113.4 - SUSE Linux Enterprise 12 x86_64
Platform: Linux
Kernel release: 4.4.180-94.135-default
Architecture: x86_64
Distribution: Description: SUSE Linux Enterprise Server 12 SP3
System info srvhanaprg2:
Corosync Cluster Engine, version ‘2.3.6’
Copyright (c) 2006-2009 Red Hat, Inc.
crmsh: 3.0.4+git.1604284154.deeba556
resource-agents: # Build version: $Format:%H$
CRM Version: 1.1.16-6.23.1 (77ea74d)
/usr/share/crmsh/hb_report/hb_report: line 764: booth: command not found
pacemaker 1.1.16-6.23.1 - SUSE Linux Enterprise 12 x86_64
libpacemaker3 1.1.16-6.23.1 - SUSE Linux Enterprise 12 x86_64
corosync 2.3.6-9.13.1 - SUSE Linux Enterprise 12 x86_64
libcorosync4 2.3.6-9.13.1 - SUSE Linux Enterprise 12 x86_64
resource-agents 4.0.1+git.1495055229.643177f1-2.53.1 - SUSE Linux Enterprise 12 x86_64
cluster-glue 1.0.12+v1.git.1485976882.03d61cd1-3.11.1 - SUSE Linux Enterprise 12 x86_64
libglue2 1.0.12+v1.git.1485976882.03d61cd1-3.11.1 - SUSE Linux Enterprise 12 x86_64
ldirectord 4.0.1+git.1495055229.643177f1-2.53.1 - SUSE Linux Enterprise 12 x86_64
libqb0 1.0.3+20171226.6d62b64-4.3.1 - SUSE Linux Enterprise 12 x86_64
ocfs2-tools 1.8.5-3.6.1 - SUSE Linux Enterprise 12 x86_64
ocfs2-kmp-default 4.4.180-94.103.1 - SUSE Linux Enterprise 12 x86_64
ocfs2-kmp-default 4.4.180-94.135.1 - SUSE Linux Enterprise 12 x86_64
drbd 9.0.14+git.62f906cf-3.25.1 - SUSE Linux Enterprise 12 x86_64
drbd-kmp-default 9.0.14+git.62f906cf_k4.4.180_94.130-3.25.1 - SUSE Linux Enterprise 12 x86_64
drbd-utils 9.4.0-2.17.1 - SUSE Linux Enterprise 12 x86_64
lvm2 2.02.120-78.16.1 - SUSE Linux Enterprise 12 x86_64
lvm2-clvm 2.02.120-78.16.1 - SUSE Linux Enterprise 12 x86_64
libdlm 4.0.7-3.3.2 - SUSE Linux Enterprise 12 x86_64
libdlm3 4.0.7-3.3.2 - SUSE Linux Enterprise 12 x86_64
ruby 2.1-1.6 - SUSE Linux Enterprise 12 x86_64
kernel-default 4.4.180-94.103.1 - SUSE Linux Enterprise 12 x86_64
kernel-default 4.4.180-94.135.1 - SUSE Linux Enterprise 12 x86_64
glibc 2.22-113.4 - SUSE Linux Enterprise 12 x86_64
Platform: Linux
Kernel release: 4.4.180-94.135-default
Architecture: x86_64
Distribution: Description: SUSE Linux Enterprise Server 12 SP3

Have you got any advice?

It seams related to IDRAC release, local DELL support try to a release roll-back and now stonith resource is up and running.

Bye.

It doesn’t necessarily means that after a certain release the resource will be working fine.