i tried to create a ClusterMon resource:
ha-idg-1:~ # crm configure show SNMP
primitive SNMP ocf:pacemaker:ClusterMon \
params user=root \
params update=5000 \
params extra_options="-S vm49093-4.scidom.de -C idg-ha" \
params htmlfile="/srv/www/hawk/public/crm_mon.html" \
op start timeout=20 interval=0 \
op stop timeout=20 interval=0 \
op monitor interval=30 timeout=20
ClusterMon uses crm_mon. But the resource always fail,
first /usr/lib64/libnetsnmp.so.30.0.3 creates a segfault und immediately afterwards crm_mon creates a coredump.
This is the typical procedure:
2019-01-16T14:12:35.921439+01:00 ha-idg-1 pengine[5690]: warning: Processing failed monitor of SNMP:0 on ha-idg-1: not running
2019-01-16T14:12:35.924387+01:00 ha-idg-1 pengine[5690]: warning: Processing failed monitor of SNMP:1 on ha-idg-2: not running
2019-01-16T14:12:35.925833+01:00 ha-idg-1 pengine[5690]: notice: * Recover SNMP:0 ( ha-idg-1 )
2019-01-16T14:12:35.926191+01:00 ha-idg-1 pengine[5690]: notice: Calculated transition 191, saving inputs in /var/lib/pacemaker/pengine/pe-input-2406.bz2
2019-01-16T14:12:35.944837+01:00 ha-idg-1 pengine[5690]: warning: Processing failed monitor of SNMP:0 on ha-idg-1: not running
2019-01-16T14:12:35.945743+01:00 ha-idg-1 pengine[5690]: warning: Processing failed monitor of SNMP:1 on ha-idg-2: not running
2019-01-16T14:12:35.949082+01:00 ha-idg-1 pengine[5690]: notice: * Recover SNMP:0 ( ha-idg-1 )
2019-01-16T14:12:35.949952+01:00 ha-idg-1 pengine[5690]: notice: Calculated transition 192, saving inputs in /var/lib/pacemaker/pengine/pe-input-2407.bz2
2019-01-16T14:12:35.950240+01:00 ha-idg-1 crmd[5691]: notice: Processing graph 192 (ref=pe_calc-dc-1547644355-512) derived from /var/lib/pacemaker/pengine/pe-input-2407.bz2
2019-01-16T14:12:35.950463+01:00 ha-idg-1 crmd[5691]: notice: Initiating stop operation SNMP_stop_0 locally on ha-idg-1
2019-01-16T14:12:35.951522+01:00 ha-idg-1 lrmd[5687]: notice: executing - rsc:SNMP action:stop call_id:242
2019-01-16T14:12:35.967848+01:00 ha-idg-1 lrmd[5687]: notice: SNMP_stop_0:29153:stderr [ /usr/lib/ocf/resource.d/pacemaker/ClusterMon: line 147: kill: (28105) - No such process ]
2019-01-16T14:12:35.968248+01:00 ha-idg-1 lrmd[5687]: notice: finished - rsc:SNMP action:stop call_id:242 pid:29153 exit-code:0 exec-time:17ms queue-time:0ms
2019-01-16T14:12:35.968657+01:00 ha-idg-1 crmd[5691]: notice: Result of stop operation for SNMP on ha-idg-1: 0 (ok)
2019-01-16T14:12:35.971903+01:00 ha-idg-1 crmd[5691]: notice: Initiating start operation SNMP_start_0 locally on ha-idg-1
2019-01-16T14:12:35.972624+01:00 ha-idg-1 lrmd[5687]: notice: executing - rsc:SNMP action:start call_id:243
2019-01-16T14:12:35.989012+01:00 ha-idg-1 su: pam_unix(su-l:session): session opened for user root by (uid=0)
2019-01-16T14:12:35.991876+01:00 ha-idg-1 systemd[1]: Started Session c4 of user root.
2019-01-16T14:12:36.046399+01:00 ha-idg-1 su: pam_unix(su-l:session): session closed for user root
2019-01-16T14:12:36.049003+01:00 ha-idg-1 lrmd[5687]: notice: finished - rsc:SNMP action:start call_id:243 pid:29158 exit-code:0 exec-time:76ms queue-time:1ms
2019-01-16T14:12:36.049729+01:00 ha-idg-1 crmd[5691]: notice: Result of start operation for SNMP on ha-idg-1: 0 (ok)
2019-01-16T14:12:36.055968+01:00 ha-idg-1 crmd[5691]: notice: Initiating monitor operation SNMP_monitor_30000 locally on ha-idg-1
2019-01-16T14:12:36.062611+01:00 ha-idg-1 crmd[5691]: notice: Transition 192 aborted by operation SNMP_monitor_30000 'modify' on ha-idg-2: Old event
2019-01-16T14:12:36.098341+01:00 ha-idg-1 crmd[5691]: notice: Transition 192 (Complete=8, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2407.bz2): Complete
2019-01-16T14:12:36.107834+01:00 ha-idg-1 kernel: [157578.314958] crm_mon[29187]: segfault at 6c ip 00007fc2ff4d928d sp 00007fff6e231800 error 4 in libnetsnmp.so.30.0.3[7fc2ff49e000+c8000]
2019-01-16T14:12:36.123148+01:00 ha-idg-1 pengine[5690]: warning: Processing failed monitor of SNMP:0 on ha-idg-1: not running
2019-01-16T14:12:36.124655+01:00 ha-idg-1 pengine[5690]: warning: Processing failed monitor of SNMP:1 on ha-idg-2: not running
2019-01-16T14:12:36.127945+01:00 ha-idg-1 pengine[5690]: notice: * Recover SNMP:1 ( ha-idg-2 )
2019-01-16T14:12:36.129056+01:00 ha-idg-1 pengine[5690]: notice: Calculated transition 193, saving inputs in /var/lib/pacemaker/pengine/pe-input-2408.bz2
2019-01-16T14:12:36.129795+01:00 ha-idg-1 crmd[5691]: notice: Processing graph 193 (ref=pe_calc-dc-1547644356-516) derived from /var/lib/pacemaker/pengine/pe-input-2408.bz2
2019-01-16T14:12:36.130047+01:00 ha-idg-1 crmd[5691]: notice: Initiating stop operation SNMP_stop_0 on ha-idg-2
2019-01-16T14:12:36.153502+01:00 ha-idg-1 crmd[5691]: notice: Initiating start operation SNMP_start_0 on ha-idg-2
2019-01-16T14:12:36.244619+01:00 ha-idg-1 crmd[5691]: notice: Initiating monitor operation SNMP_monitor_30000 on ha-idg-2
2019-01-16T14:12:36.288010+01:00 ha-idg-1 crmd[5691]: notice: Transition 193 (Complete=8, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2408.bz2): Complete
2019-01-16T14:12:36.288350+01:00 ha-idg-1 crmd[5691]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
2019-01-16T14:12:37.371629+01:00 ha-idg-1 systemd-coredump[29197]: Process 29187 (crm_mon) of user 0 dumped core.
It’s always the same:
the cluster recognizes that resource SNMP isn’t running, stops it and starts it again.
crm_mon creates a segfault while accessing the library, crm_mon terminates with a core dump. The next monitor operation for SNMP recognizes that it isn’t running and the procedure starts again. Every 30 seconds the same procedure.
i tried to create a ClusterMon resource:
ha-idg-1:~ # crm configure show SNMP
primitive SNMP ocf:pacemaker:ClusterMon \
params user=root \
params update=5000 \
params extra_options=“-S vm49093-4.scidom.de -C idg-ha” \
params htmlfile=“/srv/www/hawk/public/crm_mon.html” \
op start timeout=20 interval=0 \
op stop timeout=20 interval=0 \
op monitor interval=30 timeout=20
ClusterMon uses crm_mon. But the resource always fail,
first /usr/lib64/libnetsnmp.so.30.0.3 creates a segfault und immediately
afterwards crm_mon creates a coredump.
This is the typical procedure:
Code:
2019-01-16T14:12:35.921439+01:00 ha-idg-1 pengine[5690]: warning: Processing failed monitor of SNMP:0 on ha-idg-1: not running
2019-01-16T14:12:35.924387+01:00 ha-idg-1 pengine[5690]: warning: Processing failed monitor of SNMP:1 on ha-idg-2: not running
2019-01-16T14:12:35.925833+01:00 ha-idg-1 pengine[5690]: notice: * Recover SNMP:0 ( ha-idg-1 )
2019-01-16T14:12:35.926191+01:00 ha-idg-1 pengine[5690]: notice: Calculated transition 191, saving inputs in /var/lib/pacemaker/pengine/pe-input-2406.bz2
2019-01-16T14:12:35.944837+01:00 ha-idg-1 pengine[5690]: warning: Processing failed monitor of SNMP:0 on ha-idg-1: not running
2019-01-16T14:12:35.945743+01:00 ha-idg-1 pengine[5690]: warning: Processing failed monitor of SNMP:1 on ha-idg-2: not running
2019-01-16T14:12:35.949082+01:00 ha-idg-1 pengine[5690]: notice: * Recover SNMP:0 ( ha-idg-1 )
2019-01-16T14:12:35.949952+01:00 ha-idg-1 pengine[5690]: notice: Calculated transition 192, saving inputs in /var/lib/pacemaker/pengine/pe-input-2407.bz2
2019-01-16T14:12:35.950240+01:00 ha-idg-1 crmd[5691]: notice: Processing graph 192 (ref=pe_calc-dc-1547644355-512) derived from /var/lib/pacemaker/pengine/pe-input-2407.bz2
2019-01-16T14:12:35.950463+01:00 ha-idg-1 crmd[5691]: notice: Initiating stop operation SNMP_stop_0 locally on ha-idg-1
2019-01-16T14:12:35.951522+01:00 ha-idg-1 lrmd[5687]: notice: executing - rsc:SNMP action:stop call_id:242
2019-01-16T14:12:35.967848+01:00 ha-idg-1 lrmd[5687]: notice: SNMP_stop_0:29153:stderr [ /usr/lib/ocf/resource.d/pacemaker/ClusterMon: line 147: kill: (28105) - No such process ]
2019-01-16T14:12:35.968248+01:00 ha-idg-1 lrmd[5687]: notice: finished - rsc:SNMP action:stop call_id:242 pid:29153 exit-code:0 exec-time:17ms queue-time:0ms
2019-01-16T14:12:35.968657+01:00 ha-idg-1 crmd[5691]: notice: Result of stop operation for SNMP on ha-idg-1: 0 (ok)
2019-01-16T14:12:35.971903+01:00 ha-idg-1 crmd[5691]: notice: Initiating start operation SNMP_start_0 locally on ha-idg-1
2019-01-16T14:12:35.972624+01:00 ha-idg-1 lrmd[5687]: notice: executing - rsc:SNMP action:start call_id:243
2019-01-16T14:12:35.989012+01:00 ha-idg-1 su: pam_unix(su-l:session): session opened for user root by (uid=0)
2019-01-16T14:12:35.991876+01:00 ha-idg-1 systemd[1]: Started Session c4 of user root.
2019-01-16T14:12:36.046399+01:00 ha-idg-1 su: pam_unix(su-l:session): session closed for user root
2019-01-16T14:12:36.049003+01:00 ha-idg-1 lrmd[5687]: notice: finished - rsc:SNMP action:start call_id:243 pid:29158 exit-code:0 exec-time:76ms queue-time:1ms
2019-01-16T14:12:36.049729+01:00 ha-idg-1 crmd[5691]: notice: Result of start operation for SNMP on ha-idg-1: 0 (ok)
2019-01-16T14:12:36.055968+01:00 ha-idg-1 crmd[5691]: notice: Initiating monitor operation SNMP_monitor_30000 locally on ha-idg-1
2019-01-16T14:12:36.062611+01:00 ha-idg-1 crmd[5691]: notice: Transition 192 aborted by operation SNMP_monitor_30000 ‘modify’ on ha-idg-2: Old event
2019-01-16T14:12:36.098341+01:00 ha-idg-1 crmd[5691]: notice: Transition 192 (Complete=8, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2407.bz2): Complete
2019-01-16T14:12:36.107834+01:00 ha-idg-1 kernel: [157578.314958] crm_mon[29187]: segfault at 6c ip 00007fc2ff4d928d sp 00007fff6e231800 error 4 in libnetsnmp.so.30.0.3[7fc2ff49e000+c8000]
2019-01-16T14:12:36.123148+01:00 ha-idg-1 pengine[5690]: warning: Processing failed monitor of SNMP:0 on ha-idg-1: not running
2019-01-16T14:12:36.124655+01:00 ha-idg-1 pengine[5690]: warning: Processing failed monitor of SNMP:1 on ha-idg-2: not running
2019-01-16T14:12:36.127945+01:00 ha-idg-1 pengine[5690]: notice: * Recover SNMP:1 ( ha-idg-2 )
2019-01-16T14:12:36.129056+01:00 ha-idg-1 pengine[5690]: notice: Calculated transition 193, saving inputs in /var/lib/pacemaker/pengine/pe-input-2408.bz2
2019-01-16T14:12:36.129795+01:00 ha-idg-1 crmd[5691]: notice: Processing graph 193 (ref=pe_calc-dc-1547644356-516) derived from /var/lib/pacemaker/pengine/pe-input-2408.bz2
2019-01-16T14:12:36.130047+01:00 ha-idg-1 crmd[5691]: notice: Initiating stop operation SNMP_stop_0 on ha-idg-2
2019-01-16T14:12:36.153502+01:00 ha-idg-1 crmd[5691]: notice: Initiating start operation SNMP_start_0 on ha-idg-2
2019-01-16T14:12:36.244619+01:00 ha-idg-1 crmd[5691]: notice: Initiating monitor operation SNMP_monitor_30000 on ha-idg-2
2019-01-16T14:12:36.288010+01:00 ha-idg-1 crmd[5691]: notice: Transition 193 (Complete=8, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2408.bz2): Complete
2019-01-16T14:12:36.288350+01:00 ha-idg-1 crmd[5691]: notice: State transition S_TRANSITION_ENGINE → S_IDLE
2019-01-16T14:12:37.371629+01:00 ha-idg-1 systemd-coredump[29197]: Process 29187 (crm_mon) of user 0 dumped core.
It’s always the same:
the cluster recognizes that resource SNMP isn’t running, stops it and
starts it again.
crm_mon creates a segfault while accessing the library, crm_mon
terminates with a core dump. The next monitor operation for SNMP
recognizes that it isn’t running and the procedure starts again. Every
30 seconds the same procedure.
Any ideas ?[/color]
Which version of SUSE Linux Enterprise Server (SLES) and High
Availability Extension are you using?
This should fail with a segfault, like the invocation done by the cluster resource. The only difference to the resource script is that I omitted the “-d” option to keep the process in the foreground.
Do you see any additional output on stdout/stderr that might hint at the root cause?
Regards,
J
PS: I’m not at a matching system right now - what are the -S and -C options about?