SLES 10 SP4: can someone help me understanding a trace ?

Hi,

one of my old servers had a trace yesterday, and from that moment on it keeps running but lost network.
Here is the trace:

Dec 13 20:45:01 pc60182 /usr/sbin/cron[13143]: (root) CMD ( /opt/hp/hp-health/bin/check-for-restart-requests) Dec 13 20:45:05 pc60182 syslog-ng[3590]: STATS: dropped 0 Dec 13 20:47:10 pc60182 syslog-ng[3590]: Changing permissions on special file /dev/xconsole Dec 13 20:47:10 pc60182 syslog-ng[3590]: Changing permissions on special file /dev/tty10 Dec 13 20:47:10 pc60182 kernel: irq 113: nobody cared (try booting with the "irqpoll" option) Dec 13 20:47:10 pc60182 kernel: Dec 13 20:47:10 pc60182 kernel: Call Trace: <IRQ> <ffffffff8015fc50>{__report_bad_irq+48} Dec 13 20:47:10 pc60182 kernel: <ffffffff8015fe59>{note_interrupt+444} <ffffffff8015f72c>{__do_IRQ+191} Dec 13 20:47:10 pc60182 kernel: <ffffffff8010d569>{do_IRQ+59} <ffffffff80109e85>{mwait_idle+0} Dec 13 20:47:10 pc60182 kernel: <ffffffff8010b352>{ret_from_intr+0} <EOI> <ffffffff80109e85>{mwait_idle+0} Dec 13 20:47:10 pc60182 kernel: <ffffffff80109ec6>{mwait_idle+65} <ffffffff80109e57>{cpu_idle+160} Dec 13 20:47:10 pc60182 kernel: <ffffffff80119b1a>{start_secondary+1333} Dec 13 20:47:10 pc60182 kernel: handlers: Dec 13 20:47:10 pc60182 kernel: [<ffffffff882086fb>] (e1000_intr+0x0/0x28d [e1000]) Dec 13 20:47:10 pc60182 kernel: Disabling IRQ #113 Dec 13 20:50:01 pc60182 /usr/sbin/cron[13199]: (root) CMD ( /opt/hp/hp-health/bin/check-for-restart-requests)

Is my module or my network adapter broken ? Should i try boot with irqpoll as recommended ?

Thanks.

Bernd

Are you using the last SLES 10 SP4 patches? Do you happen to have an LTSS
repository setup for extended support?

Which kernel version, exactly, do you get from uname?

uname -a

Did this just start, or has this been going on for a while? If it just
started, have you changed anything on the box? I would not guess patches
(as SLES 10 is really old) but maybe hardware changes of some sort, or
other configuration changes?

I can find some similar reports for Red Hat, but I am not sure if a
different IRQ matters much; the symptom sounded similar, but they also had
fixes in the kernel to help with this until they came up to some problems
with getting fixes from the motherboard vendor. What brand and model of
motherboard do you have? Do you have other systems using the same
hardware and software, and do they behave the same way?

Booting with irqpoll did help others in that other thread I found, but
that may also depend on your exact kernel version.

The best solution may be to try to upgrade, if that is an option for you.
SLES 12 SP3 is out and while it is a huge jump from SLES 10 SP4, depending
on what you are doing with your server it may be a trivial change as well.
Building a new box (VM or otherwise) and moving over services is pretty
simple in most cases, but of course it depends on your workload.


Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.

If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.

For reference, this is the thread I found with a similar-sounding issue:
https://bugzilla.redhat.com/show_bug.cgi?id=755956

Comments 2, 3, 6, and 9 seemed interesting to possibly get more
information from your system to continue troubleshooting, assuming you are
patched.


Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.

If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.