SLES 11 kdumptool - error loading shared libraries

ashbyj · May 17, 2012, 4:51pm

Hi,
We’ve been having issues with SLES 11 servers freezing/crashing/hanging intermittently. This has happened for almost a year now, and continues even though we’ve upgraded to SP1 and now to SP2.

Anyway, I’ve enabled Magic SysRq on all SLES 11 servers and configured kdump. The goal is to capture crash dumps via the console on unresponsive systems for further analysis.

I’m currently testing the crash dump combo on a SLES 11 SP2 VM. The console responds to Alt+SysRq+c, but then stops at a bash prompt - see screenshot at bottom of my post. The main errors I can see are:

/sbin/resume: error while loading shared libraries: libgcrypt.so.11: cannot open shared object file: No such file or directory ... kdumptool: error while loading shared libraries: libelf.so.0: cannot open shared object file: No such file or directory

However, both shared libs exist in library directories on the root partition, so I don’t know why resume/kumptool can’t find them. Must be something with the “resume” option of the kernel that I do not understand.

[CODE]## libgcrypt
myhost:~ # whereis libgcrypt.so.11
libgcrypt.so: /lib/libgcrypt.so.11 /lib64/libgcrypt.so.11 /usr/lib64/libgcrypt.so /usr/local/lib/libgcrypt.so.11 /usr/local/lib/libgcrypt.so

myhost:~ # dir /lib/libgcrypt.so*
lrwxrwxrwx 1 root root 19 Feb 29 11:17 /lib/libgcrypt.so.11 → libgcrypt.so.11.7.0
-rwxr-xr-x 1 root root 545124 Jan 13 13:00 /lib/libgcrypt.so.11.7.0

libelf

myhost:~ # whereis libelf.so.0
libelf.so: /usr/lib/libelf.so.1 /usr/lib/libelf.so.0 /usr/lib64/libelf.so.1 /usr/lib64/libelf.so.0 /usr/local/lib/libelf.so.0 /usr/local/lib/libelf.so

myhost:~ # dir /usr/lib/libelf.so*
lrwxrwxrwx 1 root root 16 Mar 9 2011 /usr/lib/libelf.so.0 → libelf.so.0.8.12
-rwxr-xr-x 1 root root 88312 May 5 2010 /usr/lib/libelf.so.0.8.12
lrwxrwxrwx 1 root root 25 Mar 9 2011 /usr/lib/libelf.so.1 → /usr/lib/libelf.so.0.8.12[/CODE]

Any idea what the issue is here? There may be more informative messages higher up in the console, but I can’t see them and not sure how to slow down the messages to capture them, or how dump them to a file. My VM is running on VMWare.

Screenshot:

Willem1 · May 17, 2012, 5:14pm

Don’t know about those library errors… could be you need check and fix paths in /etc/ld.so.conf and rerun ldconfig. Have not needed to mess with that though on SLES 11.

Some questions to get a better feel of the environment SLES is running in/as:

Which VMware version are you running there (version/build) en which VMware tools version (also was this a tar install or rpm).

I’m also curious to know how these SLES servers have been setup (which version is it? The VMware release and 32 or 64bit) and did you add any special parameters into the boot options?

Lastly, what are these servers doing? Which software is running and maybe some custom tuning has been done?

-Willem

Willem1 · May 17, 2012, 5:21pm

One more thing to add to this: run ldconfig with the -v switch and check if the libraries kdump is fussing about appear in that list.

It could also be worth a try to see if running “SuSEconfig --verbose” mentions anything particular.

Cheers,
Willem

ashbyj · May 17, 2012, 5:26pm

Hi Willem,

The SLES 11 VM is a guest on a VMware host running ESXi 5.0.0, 623860.

SLES 11 SP2 64-bit. The ISO I used to install it was from the Novell downloads, and not VMware-specific.

myhost:~ # uname -a Linux goat 3.0.13-0.27-default #1 SMP Wed Feb 15 13:33:49 UTC 2012 (d73692b) x86_64 x86_64 x86_64 GNU/Linux

Actually, the server was installed fresh with SLES 11 SP1, then an in-place upgrade to SP2 was done via Yast > Patch CD Upgrade. Here are my boot options:

myhost:~ # cat /boot/grub/menu.lst
# Modified by YaST2. Last modification on Wed Feb 29 11:31:23 EST 2012
default 0
timeout 8
##YaST - generic_mbr
gfxmenu (hd0,0)/boot/message
##YaST - activate

###Don't change this comment - YaST2 identifier: Original name: linux###
title SUSE Linux Enterprise Server 11 SP2 - 3.0.13-0.27 (default)
    root (hd0,0)
    kernel /boot/vmlinuz-3.0.13-0.27-default root=/dev/sda1 insmod=qla4xxx resume=/dev/sdb1 splash=silent crashkernel=256M-:128M showopts
    initrd /boot/initrd-3.0.13-0.27-default

###Don't change this comment - YaST2 identifier: Original name: failsafe###
title Failsafe -- SUSE Linux Enterprise Server 11 SP2 - 3.0.13-0.27
    root (hd0,0)
    kernel /boot/vmlinuz-3.0.13-0.27-default root=/dev/sda1 showopts ide=nodma apm=off noresume edd=off powersaved=off nohz=off highres=off processor.max_cstate=1 nomodeset x11failsafe
    initrd /boot/initrd-3.0.13-0.27-default

###Don't change this comment - YaST2 identifier: Original name: linux###
title Trace -- SUSE Linux Enterprise Server 11 SP2 - 3.0.13-0.27
    root (hd0,0)
    kernel /boot/vmlinuz-3.0.13-0.27-trace root=/dev/sda1 insmod=qla4xxx resume=/dev/sdb1 splash=silent crashkernel=256M-:128M showopts
    initrd /boot/initrd-3.0.13-0.27-trace

###Don't change this comment - YaST2 identifier: Original name: floppy###
title Floppy
    rootnoverify (fd0)
    chainloader +1

This VM is used as a testing ground for proposed software in our SLES 11 environment, but it is currently not running any “extra” software. With that said, I’ve seen two other VMs crash that are actively used by our developers, then this same issue occurs when attempting to do a dump via Magic SysRq. The other two VMs run OpenLDAP and Shibboleth, but nothing else.

malcolmlewis · May 17, 2012, 5:27pm

Hi
I would guess the filesystem containing the libraries has not been
mounted at that point hence it can’t find them (/dev/sda1)…

When your at that prompt, what is mounted? (just run the mount command).
The dmesg command may offer further information. In you VM if you can
get to tty10 (ctrl+alt+F10) it has kernel messages.

I would guess that /dev/sdb1 is you swap partition (and for resume), so
why would it being resuming, sure the systems are not going into a
power management mode (seem to be frozen). I would have thought
anything related to power saving would be disabled?

–
Cheers Malcolm Â°Â¿Â° (Linux Counter #276890)
openSUSE 12.1 (x86_64) Kernel 3.1.10-1.9-desktop
up 5 days 18:26, 4 users, load average: 0.05, 0.04, 0.05
CPU Intel i5 CPU M520@2.40GHz | Intel Arrandale GPU

ashbyj · May 17, 2012, 6:06pm

This is what the mount command returns at that bash prompt (after Alt+SysRq+c is pressed):

bash-3.2# mount
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
udev on /dev type tmpfs (rw,mode=0755,nr_inodes=0)
tmpfs on /dev/shm type tmpfs (rw,mod=1777)
devpts on /dev/pts type devpts (rw,mode=0620,gid=5)
/dev/sda1 on /root type ext3 (rw,acl,user_xattr)
bash-3.2#

Cool, I did not know about tty10. I think its just Alt+F# in VMWare though to switch terminals. I was unable to get to tty10 when at the bash prompt though. After doing “exit” it seems to finish booting OK and then I’m at the normal login prompt - I’m not sure if this is a kexec/kdump kernel it puts me in or what kernel I’m presented a login prompt for? Anyway, I’m able to get to tty10 at that point, but its past the shared lib error messages I was interested in.

Correct, /dev/sdb1 is swap and the setting for resume. Ya know, I’ve never questioned the “resume” kernel option. If its for hibernating or suspending a system for power-saving reasons, I guess I don’t need it? Its a server, hence the reason we run SLES.

malcolmlewis · May 17, 2012, 6:46pm

ashbyj:

malcolmlewis;4694 Wrote:[color=blue]

I would guess the filesystem containing the libraries has not been
mounted at that point hence it can’t find them (/dev/sda1)…

When your at that prompt, what is mounted? (just run the mount
command).
[/color]

This is what the mount command returns at that bash prompt (after
Alt+SysRq+c is pressed):

Code:

bash-3.2# mount
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
udev on /dev type tmpfs (rw,mode=0755,nr_inodes=0)
tmpfs on /dev/shm type tmpfs (rw,mod=1777)
devpts on /dev/pts type devpts (rw,mode=0620,gid=5)
/dev/sda1 on /root type ext3 (rw,acl,user_xattr)
bash-3.2#

malcolmlewis;4694 Wrote:[color=blue]

The dmesg command may offer further information. In you VM if you can
get to tty10 (ctrl+alt+F10) it has kernel messages.
[/color]

Cool, I did not know about tty10. I think its just Alt+F# in VMWare
though to switch terminals. I was unable to get to tty10 when at the
bash prompt though. After doing “exit” it seems to finish booting OK
and then I’m at the normal login prompt - I’m not sure if this is a
kexec/kdump kernel it puts me in or what kernel I’m presented a login
prompt for? Anyway, I’m able to get to tty10 at that point, but its
past the shared lib error messages I was interested in.

malcolmlewis;4694 Wrote:[color=blue]

I would guess that /dev/sdb1 is you swap partition (and for resume),
so
why would it being resuming, sure the systems are not going into a
power management mode (seem to be frozen). I would have thought
anything related to power saving would be disabled?
[/color]

Correct, /dev/sdb1 is swap and the setting for resume. Ya know, I’ve
never questioned the “resume” kernel option. If its for hibernating or
suspending a system for power-saving reasons, I guess I don’t need it?
Its a server, hence the reason we run SLES.

Hi
So /dev/sda1 is mounted as /root (the user) not / hence it can’t find
the libraries. Unless there are directories under /root?

What does the mount command say when all back and running?

Maybe a browse through the system BIOS may add some additional
information on any power saving features that are enabled?

AFAIK anything power related should be disabled, check down in /etc/pm
and /etc/pm-profiler if they exist.

–
Cheers Malcolm Â°Â¿Â° (Linux Counter #276890)
openSUSE 12.1 (x86_64) Kernel 3.1.10-1.9-desktop
up 5 days 19:40, 4 users, load average: 0.00, 0.02, 0.05
CPU Intel i5 CPU M520@2.40GHz | Intel Arrandale GPU

ashbyj · May 17, 2012, 9:15pm

[QUOTE=malcolmlewis;4698]Hi
So /dev/sda1 is mounted as /root (the user) not / hence it can’t find
the libraries. Unless there are directories under /root?

What does the mount command say when all back and running?
[/QUOTE]

OK I didn’t notice that it was mounted as /root, but now that makes sense that it can’t find the libs.

After issuing “exit” at the bash prompt, it boots to what looks like a healthy state and this is what mount says:

myhost:~ # mount /dev/sda1 on / type ext3 (rw,acl,user_xattr) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devtmpfs on /dev type devtmpfs (rw,mode=0755) tmpfs on /dev/shm type tmpfs (rw,mode=1777) devpts on /dev/pts type devpts (rw,mode=0620,gid=5) fusectl on /sys/fs/fuse/connections type fusectl (rw) securityfs on /sys/kernel/security type securityfs (rw) nfs1:/vol/local.sles11 on /usr/local type nfs (rw,nolock,addr=172.xxx.xxx.xxx) nfs1:/vol/source.sles11 on /usr/local/src type nfs (rw,nolock,addr=172.xxx.xxx.xxx) nfs1:/vol/misc on /usr/misc type nfs (rw,nolock,addr=172.xxx.xxx.xxx) rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) none on /var/lib/ntp/proc type proc (ro,nosuid,nodev)

(I masked out my IPs in the output). I’m not concerned about the power-saving modules at this point, unless you think its affecting the ability to invoke magic sysrq commands and kdump.

ashbyj · May 17, 2012, 10:07pm

On a related note, I can’t seem to get the proper crashkernel setting for kdump.

myhost:~ # /etc/init.d/boot.kdump restart Loading kdump Then try loading kdump kernel Memory for crashkernel is not reserved Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel failed

You can see my menu.lst in an earlier post. I’m using the default which is crashkernel=256M-:128M and also tried other various settings with and without the @YM offset. I’m clueless as to what this should be set to.

Willem1 · May 18, 2012, 8:57am

I’m curios why the VM has that insmod entry for the qla4xxx… are you using iSCSI within the VM? And unless it would also be for the OS disk, that line is not needed in the grub section AFAIK.

Cheers,
Willem

ashbyj · May 18, 2012, 2:49pm

[QUOTE=Magic31;4712]I’m curios why the VM has that insmod entry for the qla4xxx… are you using iSCSI within the VM? And unless it would also be for the OS disk, that line is not needed in the grub section AFAIK.

Cheers,
Willem[/QUOTE]

True, that shouldn’t be in there. This may have carried over from a PXE network install, where we have this set in the kernel options for our physical servers that use iSCSI. I’m going to remove it on my VMs.

ashbyj · May 22, 2012, 4:17pm

[QUOTE=ashbyj;4699]OK I didn’t notice that it was mounted as /root, but now that makes sense that it can’t find the libs.
[/QUOTE]

Any idea why this would be mounting as /root instead of /?

malcolmlewis · May 22, 2012, 6:13pm

Hi
I would guess because it’s in single user mode as root user it’s
creating a temporary /root for maintenance mode.

So there is no filesystem under /root eg /root/etc?

It’s almost as though it’s adding a init=/bin/bash when booting…

–
Cheers Malcolm Â°Â¿Â° (Linux Counter #276890)
openSUSE 12.1 (x86_64) Kernel 3.1.10-1.9-desktop
up 1 day 12:46, 4 users, load average: 0.05, 0.03, 0.05
CPU Intel i5 CPU M520@2.40GHz | Intel Arrandale GPU

ashbyj · May 24, 2012, 4:27pm

Yes, there appears to be some sort of filesystem under /root. Here are some screenshots of file listings under root and etc/, and also the help command, although you can’t see everything and the “more” command is not available.

Topic		Replies	Views
NFS Server error on SLES 11 sp3 SLES Configure-Administer	3	210	October 15, 2015
SLES 11 Kernel crash after updates to 3.0.58-0.6.2 zLinux SLES Install-Boot	1	239	February 28, 2013
gnome-control-center does not open in SLED12 SLED Configure-Administer	9	649	September 10, 2015
sles11 suddently drops all services..... SLES Configure-Administer	5	191	December 29, 2015
After SLES11 SP3 Update - Repo Error libgcrypt11-1.5.0 issue SLES Updates	6	256	December 26, 2013

SLES 11 kdumptool - error loading shared libraries

libelf

Related topics