General protection fault

Hello everyone! Some time ago our server hung. I don’t really understand what’s the problem. Memtest and HDD benchmark didn’t show any error (Inquisitor hardware testing platform).
There is a part of logfile:

Feb 21 03:00:10 node501 kernel: [1657361.355785] general protection fault: 0000 [#5] SMP Feb 21 03:00:10 node501 kernel: [1657361.355801] last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/host 8/rport-8:0-1/target8:0:1/8:0:1:4/state Feb 21 03:00:10 node501 kernel: [1657361.355809] CPU 25 Feb 21 03:00:10 node501 kernel: [1657361.355813] Modules linked in: nls_utf8 oracleacfs(PX) oracleadvm(PX) oracleoks(PX) af_packet oracleasm(X) nfs lockd fscache nfs_acl auth_rpcgss sunrpc bonding cpufreq_conservative cpufreq_userspace cpuf req_powersave acpi_cpufreq ib_ipoib ib_cm ipv6 ib_usa(N) ib_sa ib_uverbs ib_umad iw_nes crc32c libcrc32c iw_cxgb3 cxgb3 kcopy(N) mlx4_ib mlx4_core ib_mthca microcode fuse loop rds_tcp(N) rds(N) sr_mod cdrom ib_qib(N) tpm_tis qla2xxx ses shp chp ib_mad tpm tpm_bios serio_raw igb pcspkr ib_core pci_hotplug dca enclosure scsi_transport_fc usb_storage joydev sg s csi_tgt rtc_cmos rtc_core rtc_lib container button usbhid hid scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc dm_rou nd_robin dm_multipath scsi_dh ehci_hcd usbcore sd_mod crc_t10dif dm_snapshot dm_mod edd ext3 mbcache jbd fan processor a acraid(N) ahci libata scsi_mod thermal thermal_sys hwmon Feb 21 03:00:10 node501 kernel: [1657361.355921] Supported: No, Unsupported modules are loaded Feb 21 03:00:10 node501 kernel: [1657361.355929] Pid: 17729, comm: emagent Tainted: P M D NX 2.6.32.12-0.7-default #1 X9DRW Feb 21 03:00:10 node501 kernel: [1657361.355936] RIP: 0010:[<ffffffff81313548>] [<ffffffff81313548>] netlink_autobind+0 x78/0xf0 Feb 21 03:00:10 node501 kernel: [1657361.355954] RSP: 0018:ffff88042391be68 EFLAGS: 00010086 Feb 21 03:00:10 node501 kernel: [1657361.355959] RAX: 90ff882039401000 RBX: ffffffff81cd2940 RCX: 00000000767248b7 Feb 21 03:00:10 node501 kernel: [1657361.355965] RDX: 00000000fc0ec4e4 RSI: 00000000f97134e7 RDI: ffff8820396a3000 Feb 21 03:00:10 node501 kernel: [1657361.355971] RBP: 0000000000002b49 R08: 0000000062ef2729 R09: ffff882035d3bcf0 Feb 21 03:00:10 node501 kernel: [1657361.355977] R10: 0000000000000a84 R11: ffffffff811a51c0 R12: ffff8820396a3000 Feb 21 03:00:10 node501 kernel: [1657361.355983] R13: ffff882034584800 R14: 00000000ffffefff R15: 0000000000000000 Feb 21 03:00:10 node501 kernel: [1657361.355989] FS: 00007fe20636e820(0000) GS:ffff8810b8920000(0000) knlGS:00000000000 00000 Feb 21 03:00:10 node501 kernel: [1657361.355995] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 21 03:00:10 node501 kernel: [1657361.356001] CR2: 00007f8c84006718 CR3: 00000014cb1e6000 CR4: 00000000000406e0 Feb 21 03:00:10 node501 kernel: [1657361.356007] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 21 03:00:10 node501 kernel: [1657361.356013] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Feb 21 03:00:10 node501 kernel: [1657361.356019] Process emagent (pid: 17729, threadinfo ffff88042391a000, task ffff8804 2b7e6280) Feb 21 03:00:10 node501 kernel: [1657361.356025] Stack: Feb 21 03:00:10 node501 kernel: [1657361.356028] ffff8814a014bc00 ffff882034584800 ffffffff81cd2940 ffff88042391bec8 Feb 21 03:00:10 node501 kernel: [1657361.356035] <0> 00007fe2085a05e0 ffffffff813139fd 0000000000000000 ffff8814a014bc00 Feb 21 03:00:10 node501 kernel: [1657361.356043] <0> 000000000000000c 00007fe20636d640 00007fe2080026cd ffffffff812e236f Feb 21 03:00:10 node501 kernel: [1657361.356053] Call Trace: Feb 21 03:00:10 node501 kernel: [1657361.356071] [<ffffffff813139fd>] netlink_bind+0x7d/0x1f0 Feb 21 03:00:10 node501 kernel: [1657361.356083] [<ffffffff812e236f>] sys_bind+0xdf/0xf0 Feb 21 03:00:10 node501 kernel: [1657361.356097] [<ffffffff81002f7b>] system_call_fastpath+0x16/0x1b Feb 21 03:00:10 node501 kernel: [1657361.356109] [<00007fe20cba0c07>] 0x7fe20cba0c07 Feb 21 03:00:10 node501 kernel: [1657361.356114] Code: e8 7e e7 ff ff 89 ee 4c 89 e7 e8 c4 d2 ff ff 48 8b 00 48 85 c0 75 14 eb 4a 66 2e 0f 1f 84 00 00 00 00 00 48 85 d2 74 3b 48 89 d0 <48> 3b 58 38 48 8b 10 0f 18 0a 75 ec 3b a8 50 02 00 00 75 e4 8b Feb 21 03:00:10 node501 kernel: [1657361.356161] RIP [<ffffffff81313548>] netlink_autobind+0x78/0xf0

Could anybody say what’s the problem? Thanks in advance.

ArtRet wrote:
[color=blue]

Hello everyone! Some time ago our server hung. I don’t really understand
what’s the problem. Memtest and HDD benchmark didn’t show any error
(Inquisitor hardware testing platform).
There is a part of logfile:

Code:

Feb 21 03:00:10 node501 kernel: [1657361.355785] general protection

fault: 0000 [#5] SMP
Feb 21 03:00:10 node501 kernel: [1657361.355801] last sysfs file:
/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/host
8/rport-8:0-1/target8:0:1/8:0:1:4/state
Feb 21 03:00:10 node501 kernel: [1657361.355809] CPU 25
Feb 21 03:00:10 node501 kernel: [1657361.355813] Modules linked in:
nls_utf8 oracleacfs(PX) oracleadvm(PX) oracleoks(PX)
af_packet oracleasm(X) nfs lockd fscache nfs_acl auth_rpcgss sunrpc
bonding cpufreq_conservative cpufreq_userspace cpuf
req_powersave acpi_cpufreq ib_ipoib ib_cm ipv6 ib_usa(N) ib_sa
ib_uverbs ib_umad iw_nes crc32c libcrc32c iw_cxgb3 cxgb3
kcopy(N) mlx4_ib mlx4_core ib_mthca microcode fuse loop rds_tcp(N)
rds(N) sr_mod cdrom ib_qib(N) tpm_tis qla2xxx ses shp
chp ib_mad tpm tpm_bios serio_raw igb pcspkr ib_core pci_hotplug dca
enclosure scsi_transport_fc usb_storage joydev sg s
csi_tgt rtc_cmos rtc_core rtc_lib container button usbhid hid
scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc dm_rou
nd_robin dm_multipath scsi_dh ehci_hcd usbcore sd_mod crc_t10dif
dm_snapshot dm_mod edd ext3 mbcache jbd fan processor a
acraid(N) ahci libata scsi_mod thermal thermal_sys hwmon
Feb 21 03:00:10 node501 kernel: [1657361.355921] Supported: No,
Unsupported modules are loaded
Feb 21 03:00:10 node501 kernel: [1657361.355929] Pid: 17729, comm:
emagent Tainted: P M D NX 2.6.32.12-0.7-default
#1 X9DRW
Feb 21 03:00:10 node501 kernel: [1657361.355936] RIP:
0010:[] [] netlink_autobind+0
x78/0xf0
Feb 21 03:00:10 node501 kernel: [1657361.355954] RSP:
0018:ffff88042391be68 EFLAGS: 00010086
Feb 21 03:00:10 node501 kernel: [1657361.355959] RAX: 90ff882039401000
RBX: ffffffff81cd2940 RCX: 00000000767248b7
Feb 21 03:00:10 node501 kernel: [1657361.355965] RDX: 00000000fc0ec4e4
RSI: 00000000f97134e7 RDI: ffff8820396a3000
Feb 21 03:00:10 node501 kernel: [1657361.355971] RBP: 0000000000002b49
R08: 0000000062ef2729 R09: ffff882035d3bcf0
Feb 21 03:00:10 node501 kernel: [1657361.355977] R10: 0000000000000a84
R11: ffffffff811a51c0 R12: ffff8820396a3000
Feb 21 03:00:10 node501 kernel: [1657361.355983] R13: ffff882034584800
R14: 00000000ffffefff R15: 0000000000000000
Feb 21 03:00:10 node501 kernel: [1657361.355989] FS:
00007fe20636e820(0000) GS:ffff8810b8920000(0000) knlGS:00000000000
00000
Feb 21 03:00:10 node501 kernel: [1657361.355995] CS: 0010 DS: 0000 ES:
0000 CR0: 0000000080050033
Feb 21 03:00:10 node501 kernel: [1657361.356001] CR2: 00007f8c84006718
CR3: 00000014cb1e6000 CR4: 00000000000406e0
Feb 21 03:00:10 node501 kernel: [1657361.356007] DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
Feb 21 03:00:10 node501 kernel: [1657361.356013] DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb 21 03:00:10 node501 kernel: [1657361.356019] Process emagent (pid:
17729, threadinfo ffff88042391a000, task ffff8804
2b7e6280)
Feb 21 03:00:10 node501 kernel: [1657361.356025] Stack:
Feb 21 03:00:10 node501 kernel: [1657361.356028] ffff8814a014bc00
ffff882034584800 ffffffff81cd2940 ffff88042391bec8
Feb 21 03:00:10 node501 kernel: [1657361.356035] <0> 00007fe2085a05e0
ffffffff813139fd 0000000000000000 ffff8814a014bc00
Feb 21 03:00:10 node501 kernel: [1657361.356043] <0> 000000000000000c
00007fe20636d640 00007fe2080026cd ffffffff812e236f
Feb 21 03:00:10 node501 kernel: [1657361.356053] Call Trace:
Feb 21 03:00:10 node501 kernel: [1657361.356071] []
netlink_bind+0x7d/0x1f0
Feb 21 03:00:10 node501 kernel: [1657361.356083] [] sys_bind+0xdf/0xf0
Feb 21 03:00:10 node501 kernel: [1657361.356097] []
system_call_fastpath+0x16/0x1b
Feb 21 03:00:10 node501 kernel: [1657361.356109] [<00007fe20cba0c07>] 0x7fe20cba0c07
Feb 21 03:00:10 node501 kernel: [1657361.356114] Code: e8 7e e7 ff ff
89 ee 4c 89 e7 e8 c4 d2 ff ff 48 8b 00 48 85 c0 75
14 eb 4a 66 2e 0f 1f 84 00 00 00 00 00 48 85 d2 74 3b 48 89 d0 <48> 3b
58 38 48 8b 10 0f 18 0a 75 ec 3b a8 50 02 00 00
75 e4 8b
Feb 21 03:00:10 node501 kernel: [1657361.356161] RIP
[] netlink_autobind+0x78/0xf0

Could anybody say what’s the problem? Thanks in advance.[/color]

Which version of SUSE Linux Enterprise Server are you using? What does “cat
/etc/*release” produce?
How up-to-date are you with patches?
Which kernel version are using? “rpm -qa | grep kernel”

Is this server hosting Oracle? If so, which version?

HTH.

Simon
SUSE Knowledge Partner

Sorry for delay.
cat /etc/*release:

LSB_VERSION="core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-x86_64:core-3.2-x86_64:core-4.0-x86_64" SUSE Linux Enterprise Server 11 (x86_64) VERSION = 11 PATCHLEVEL = 1
rpm -qa | grep kernel:

kernel-default-devel-2.6.32.12-0.7.1 kernel-mft-2.7.1-2.6.32.12_0.7_default kernel-source-2.6.32.12-0.7.1 kernel-ib-1.5.4.1-2.6.32.12_0.7_default kernel-default-base-2.6.32.12-0.7.1 kernel-ib-devel-1.5.4.1-2.6.32.12_0.7_default linux-kernel-headers-2.6.32-1.4.13 kernel-default-2.6.32.12-0.7.1
Yes, this server is hosting Oracle Database Standard Edition 11.2.0.3.

I don’t really know if patches are up-to-date.

It seems the problem was in number of huge pages.