Hi experts,
I’m using SLES 12 SP2 for test purpose and I’m also testing SLES High Availability.
In order to working with fence_kdump on my cluster, I’ve modified /etc/sysconfig/kdump refer to user document(https://www.suse.com/documentation/sle-ha-12/book_sleha/data/sec_ha_fencing_config.html#ex_ha_fencing_kdump)
According to that document, if make fence_kdump resource work properly, kdump should be configured like below:
cat /etc/sysconfig/kdump
<…snip>
KDUMP_POSTSCRIPT="/usr/lib64/fence_kdump_send -i 1 -p 7410 node02"
However, fence_kdump_send script doesn’t exist on /usr/lib64. it’s on /usr/lib so I’ve modified like below:
KDUMP_POSTSCRIPT="/usr/lib64/fence_kdump_send -i 1 -p 7410 node02"
Then, I crashed kernel with command “echo c > /proc/sysrq-trigger”
The kernel has crashed as expected but kdump didn’t send message to node02 because of below error messages:
…
Generating REAME Finished.
Copying System.map Finished.
Copying Kernel Finished.
Running /usr/lib/fence_kdump_send -i 1 -p7410 node02
/lib/kdump/save_dump.sh: line234: /usr/lib/fence_kdump_send: No such file or directory
Last command failed (127)
…
in case of system memory is small and dumped very quickly, it doesn’t matter because fence_kdump on node02 is waiting 60sec by default but if system has a lot of memory and needed more time to finish dump, to keep vmcore complete, fence_kdump should receive message from node01.
When kernel panic occurred and fence_kdump failed, then second fence method(usually it’s fence device based on power device)will be executed. in this case, vmcore will not created completely.
Any idea would be appreciated.
Thank you