SLES11 SP2: 32bit app crashes with glibc-2.11.3-17-39.1

Hi,

we experience reproducible crashes on SLES11 SP2 with glibc-2.11.3-17-39.1. The app that crashes is a 32bit x86 multithreaded binary. The crash is reproducible and only occurs with glibc-2.11.3-17-39.1.

The crash always occur when the app is stopped, most probably in some cleanup code of glibc.

When the app links to glibc-2.11.3-17-35.4 there are no crashes.

64bit apps appear to be not affected at all.

After the crash the call stack ist always (output of gdb where):
#0 0xffffe430 in __kernel_vsyscall ()
#1 0xf737ee30 in raise () from /lib/libc.so.6
#2 0xf7380765 in abort () from /lib/libc.so.6
#3 0xf7377db8 in __assert_fail () from /lib/libc.so.6
#4 0xf74cd5c5 in __pthread_mutex_cond_lock () from /lib/libpthread.so.0
#5 0xf74c7a6d in __condvar_tw_cleanup () from /lib/libpthread.so.0
#6 0xf752e2e8 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

I was not yet able to provide a test program but due to the fact that only the latest 32bit glibc version seems to be affected I’m pretty sure it’s a glibc bug, most probably in some nptl patch added (or not) between and glibc-2.11.3-17-35.4 and glibc-2.11.3-17-39.1.

Thanks and best regards,

Michael Kwasigroch, Intercope GmbH, Hamburg

The glibc update you’re referring to implements a fix in pthread_cond_timedwait(). Previously, on x86 only, the passed timeout argument was interpreted relative, and it has been fixed to absolute in this update, as it is the case on every other architecture and documented by POSIX. Which framework or application do you use?

Are you using pthread_cond_timedwait() in your application? It would be good if you could get in contact with the support in order to get a PTF for this problem.

[QUOTE=dirkmueller;6474]The glibc update you’re referring to implements a fix in pthread_cond_timedwait(). Previously, on x86 only, the passed timeout argument was interpreted relative, and it has been fixed to absolute in this update, as it is the case on every other architecture and documented by POSIX. Which framework or application do you use?

Are you using pthread_cond_timedwait() in your application? It would be good if you could get in contact with the support in order to get a PTF for this problem.[/QUOTE]

Hi,

thanks for your reply! I greatly appreciate this. :slight_smile:

I’ve noticed the fix you mention in the release notes and it’s possible that this or some other changes causes “our” problem. We don’t use any framework or app, it’s coded in plain jane C. It is highly portable code that runs on AIX, Solaris, Linux, z/OS UNIX in 32bit or 64bit. Since glibc moved to NPTL we did not have any such issues on Linux.

We heavily use pthread_cond_timedwait(), pthread_cond_wait(), pthread_mutex_lock() etc. in our app. Since the crashes always occur during app shutdown where some threads are killed via pthread_cancel() and by looking at the stack frames in the core dumps I have the feeling that the glibc condvar-cleanup code does something bad that triggers the abort().

Since we are an ISV I’d appreciate if you could give me a hint how to get in touch with support.

We are highly interested in getting this fixed!

Thanks and best regards,

  • Michael Kwasigroch, Intercope Hamburg, Germany

Hi again,

I opened a software defect at Novell’s portal. Let’s see what this will do.

Cheers,

  • Michael

Yes, I’ve done some research and it is indeed an unrelated regression by the code change. We’ll fix it shortly. for the meantime, please let me know about customers which are affected so that they can get a PTF right away.

Thanks!

[QUOTE=dirkmueller;6511]Yes, I’ve done some research and it is indeed an unrelated regression by the code change. We’ll fix it shortly. for the meantime, please let me know about customers which are affected so that they can get a PTF right away.

Thanks![/QUOTE]

Hi!

This is great news! :smiley: I will not unveil the names of our customers (I’m sure you understand) but if you give me the PTF I’ll test it right away!

Thanks and best regards,

  • Michael

[QUOTE=dirkmueller;6511]Yes, I’ve done some research and it is indeed an unrelated regression by the code change. We’ll fix it shortly. for the meantime, please let me know about customers which are affected so that they can get a PTF right away.

Thanks![/QUOTE]

Any idea when the fix or PTF is ready? It’s been a while now… I would even test a pre-release of the fix if that would be of any help.

Thanks and best regards!

With the latest glibc version (glibc-2.11.3-17.43.1) I do no longer see these crashes so I consider this issue to be resolved.