SLES11 SP2: multipath: Kernel panic after path failure

Hi,

i have a problem with some servers and multipath:

HW setup:
Dell R720 with 64GB ram.
2x qla2xxx FC controllers with 2 ports each
2x Dell MD3620f Storage

Installed drivers:
qla2xxx (from Dell)
scsi_dh_rdac (from Dell)

after a path failure i get the following errors:

[ 2513.720533] rport-10:0-12: blocked FC remote port time out: removing rport
[ 2513.720542] rport-10:0-13: blocked FC remote port time out: removing rport
[ 2513.720547] rport-10:0-11: blocked FC remote port time out: removing rport
[ 2514.104142] rport-9:0-11: blocked FC remote port time out: removing rport
[ 2514.110747] rport-11:0-11: blocked FC remote port time out: removing rport
[ 2514.619173] rport-8:0-11: blocked FC remote port time out: removing rport
[ 2571.599640] device-mapper: multipath: Failing path 70:48.
[ 2571.599952] device-mapper: multipath: Failing path 70:80.
[ 2571.600215] device-mapper: multipath: Failing path 70:112.
[ 2571.600479] device-mapper: multipath: Failing path 70:128.
[ 2571.600737] device-mapper: multipath: Failing path 70:144.
[ 2571.601067] device-mapper: multipath: Failing path 70:160.
[ 2571.601325] device-mapper: multipath: Failing path 70:192.
[ 2571.601716] device-mapper: multipath: Failing path 70:240.
[ 2571.601978] device-mapper: multipath: Failing path 71:16.
[ 2571.602237] device-mapper: multipath: Failing path 71:48.
[ 2571.602502] device-mapper: multipath: Failing path 71:64.
[ 2571.602750] device-mapper: multipath: Failing path 71:80.
[ 2571.603007] device-mapper: multipath: Failing path 71:96.
[ 2571.603265] device-mapper: multipath: Failing path 71:128.
[ 2571.603543] device-mapper: multipath: Failing path 71:176.
[ 2571.603786] device-mapper: multipath: Failing path 71:208.
[ 2571.604036] device-mapper: multipath: Failing path 71:240.
[ 2571.604282] device-mapper: multipath: Failing path 128:0.
[ 2571.604527] device-mapper: multipath: Failing path 128:32.
[ 2571.604869] device-mapper: multipath: Failing path 128:64.
[ 2571.605131] device-mapper: multipath: Failing path 128:16.
[ 2571.605395] device-mapper: multipath: Failing path 128:112.
[ 2571.605724] device-mapper: multipath: Failing path 128:176.
[ 2571.605977] device-mapper: multipath: Failing path 128:144.
[ 2571.606227] device-mapper: multipath: Failing path 128:192.
[ 2571.606484] device-mapper: multipath: Failing path 128:224.
[ 2571.606730] device-mapper: multipath: Failing path 128:208.
[ 2571.607052] device-mapper: multipath: Failing path 129:0.
2603.690284] rport-10:0-7: blocked FC remote port time out: removing target and saving binding
[ 2603.690307] rport-10:0-1: blocked FC remote port time out: removing target and saving binding
[ 2603.690317] rport-10:0-2: blocked FC remote port time out: removing target and saving binding
[ 2603.690325] rport-10:0-3: blocked FC remote port time out: removing target and saving binding
[ 2603.690334] rport-10:0-8: blocked FC remote port time out: removing target and saving binding
[ 2603.691765] rport-10:0-4: blocked FC remote port time out: removing target and saving binding
[ 2603.691781] rport-10:0-5: blocked FC remote port time out: removing target and saving binding
[ 2603.691790] rport-10:0-6: blocked FC remote port time out: removing target and saving binding
[ 2603.692404] sd 10:0:6:0: rdac: Detached
[ 2603.703358] sd 10:0:6:0: [sdee] Synchronizing SCSI cache
[ 2603.703399] sd 10:0:6:0: [sdee] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2603.704404] sd 10:0:6:1: rdac: Detached
[ 2603.706457] sd 10:0:6:1: [sdef] Synchronizing SCSI cache
[ 2603.706504] sd 10:0:6:1: [sdef] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2603.707020] sd 10:0:6:2: rdac: Detached
[ 2603.719430] sd 10:0:6:2: [sdeg] Synchronizing SCSI cache
[ 2603.719479] sd 10:0:6:2: [sdeg] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2603.720086] sd 10:0:6:3: rdac: Detached
[ 2603.726895] sd 10:0:6:3: [sdeh] Synchronizing SCSI cache
[ 2603.726985] sd 10:0:6:3: [sdeh] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2603.727822] sd 10:0:6:4: rdac: Detached
[ 2603.735006] sd 10:0:6:4: [sdei] Synchronizing SCSI cache
[ 2603.735053] sd 10:0:6:4: [sdei] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2603.735617] sd 10:0:6:5: rdac: Detached
[ 2603.741101] sd 10:0:6:5: [sdej] Synchronizing SCSI cache
[ 2603.741149] sd 10:0:6:5: [sdej] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 2628.151416] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:3:2154]
[ 2628.151419] Modules linked in: ipmi_si iptable_filter ip_tables x_tables xfs binfmt_misc edd mpt2sas scsi_transport_sas raid_class mptctl mptbase i
pmi_devintf ipmi_msghandler dell_rbu(X) bonding mperf microcode fuse nls_utf8 loop pciehp qla2xxx(X) joydev usbhid hid usb_storage ixgbe dca scsi_tran
sport_fc scsi_tgt tg3 shpchp pci_hotplug mdio sg sr_mod cdrom ipv6 ipv6_lib wmi dcdbas(X) pcspkr acpi_power_meter acpi_pad button iTCO_wdt iTCO_vendor
_support rtc_cmos dm_round_robin ehci_hcd usbcore usb_common sd_mod crc_t10dif processor thermal_sys hwmon scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw dm_s
napshot dm_multipath dm_mod scsi_dh_rdac scsi_dh ext3 mbcache jbd ahci libahci libata megaraid_sas scsi_mod [last unloaded: ipmi_si]
[ 2628.151467] Supported: Yes
[ 2628.151468] CPU 0
[ 2628.151469] Modules linked in: ipmi_si iptable_filter ip_tables x_tables xfs binfmt_misc edd mpt2sas scsi_transport_sas raid_class mptctl mptbase i
pmi_devintf ipmi_msghandler dell_rbu(X) bonding mperf microcode fuse nls_utf8 loop pciehp qla2xxx(X) joydev usbhid hid usb_storage ixgbe dca scsi_tran
sport_fc scsi_tgt tg3 shpchp pci_hotplug mdio sg sr_mod cdrom ipv6 ipv6_lib wmi dcdbas(X) pcspkr acpi_power_meter acpi_pad button iTCO_wdt iTCO_vendor
_support rtc_cmos dm_round_robin ehci_hcd usbcore usb_common sd_mod crc_t10dif processor thermal_sys hwmon scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw dm_s
napshot dm_multipath dm_mod scsi_dh_rdac scsi_dh ext3 mbcache jbd ahci libahci libata megaraid_sas scsi_mod [last unloaded: ipmi_si]
[ 2628.151498] Supported: Yes
[ 2628.151500]
[ 2628.151502] Pid: 2154, comm: kworker/0:3 Tainted: G X 3.0.42-0.7-default #1 Dell Inc. PowerEdge R720/0VWT90
[ 2628.151506] RIP: 0010:[] [] _raw_spin_unlock_irqrestore+0x8/0x10
[ 2628.151514] RSP: 0018:ffff8807f65e3df8 EFLAGS: 00000202
[ 2628.151516] RAX: ffff881000375000 RBX: 0000000000001654 RCX: 000000000000948c
[ 2628.151518] RDX: 000000000000948c RSI: 0000000000000202 RDI: 0000000000000202
[ 2628.151519] RBP: ffff8807f7d22060 R08: ffffc900060e6000 R09: 0000000000000594
[ 2628.151521] R10: 0000000000001654 R11: 00000000fffffffc R12: ffffffff8144b66e
[ 2628.151523] R13: ffff8807f7d22060 R14: ffffffff8144b66e R15: ffff8807f5696800
[ 2628.151525] FS: 0000000000000000(0000) GS:ffff88082fc00000(0000) knlGS:0000000000000000
[ 2628.151527] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2628.151529] CR2: 00007f1cea4af214 CR3: 00000007fd5c9000 CR4: 00000000000406f0
[ 2628.151530] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2628.151532] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2628.151534] Process kworker/0:3 (pid: 2154, threadinfo ffff8807f65e2000, task ffff8807fdc560c0)
[ 2628.151536] Stack:
[ 2628.151539] ffffffffa000f46d ffff8807f7d223c8 0000000000000000 ffff8807f7d223c8
[ 2628.151544] ffff8807fdfabb40 ffff88082fc0cf80 ffffffff8107426c ffffe8f7ffa02e00
[ 2628.151547] 0000000000000000 ffffe8f7ffc0b300 ffff8807fdfabb40 ffff88082fc0cf80
[ 2628.151550] Call Trace:
[ 2628.151568] [] scsi_remove_target+0xbd/0xf0 [scsi_mod]
[ 2628.151587] [] process_one_work+0x16c/0x350
[ 2628.151593] [] worker_thread+0x17a/0x410
[ 2628.151597] [] kthread+0x96/0xa0
[ 2628.151602] [] kernel_thread_helper+0x4/0x10
[ 2628.151605] Code: 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 07 f3 90 0f b7 17 eb f5 c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 ff 07 48 89 f7 57 9d
[ 2628.151622] 66 90 66 90 c3 66 90 b8 ff ff ff ff f0 0f c1 07 83 e8 01 ba
[ 2628.151629] Call Trace:
[ 2628.151637] [] scsi_remove_target+0xbd/0xf0 [scsi_mod]
[ 2628.151650] [] process_one_work+0x16c/0x350
[ 2628.151654] [] worker_thread+0x17a/0x410
[ 2628.151657] [] kthread+0x96/0xa0
[ 2628.151661] [] kernel_thread_helper+0x4/0x10
[ 2656.107213] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:3:2154]
[ 2656.107215] Modules linked in: ipmi_si iptable_filter ip_tables x_tables xfs binfmt_misc edd mpt2sas scsi_transport_sas raid_class mptctl mptbase i
pmi_devintf ipmi_msghandler dell_rbu(X) bonding mperf microcode fuse nls_utf8 loop pciehp qla2xxx(X) joydev usbhid hid usb_storage ixgbe dca scsi_tran
sport_fc scsi_tgt tg3 shpchp pci_hotplug mdio sg sr_mod cdrom ipv6 ipv6_lib wmi dcdbas(X) pcspkr acpi_power_meter acpi_pad button iTCO_wdt iTCO_vendor
_support rtc_cmos dm_round_robin ehci_hcd usbcore usb_common sd_mod crc_t10dif processor thermal_sys hwmon scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw dm_s
napshot dm_multipath dm_mod scsi_dh_rdac scsi_dh ext3 mbcache jbd ahci libahci libata megaraid_sas scsi_mod [last unloaded: ipmi_si]
[ 2656.107264] Supported: Yes
[ 2656.107265] CPU 0
[ 2656.107266] Modules linked in: ipmi_si iptable_filter ip_tables x_tables xfs binfmt_misc edd mpt2sas scsi_transport_sas raid_class mptctl mptbase i
pmi_devintf ipmi_msghandler dell_rbu(X) bonding mperf microcode fuse nls_utf8 loop pciehp qla2xxx(X) joydev usbhid hid usb_storage ixgbe dca scsi_tran
sport_fc scsi_tgt tg3 shpchp pci_hotplug mdio sg sr_mod cdrom ipv6 ipv6_lib wmi dcdbas(X) pcspkr acpi_power_meter acpi_pad button iTCO_wdt iTCO_vendor
_support rtc_cmos dm_round_robin ehci_hcd usbcore usb_common sd_mod crc_t10dif processor thermal_sys hwmon scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw dm_s
napshot dm_multipath dm_mod scsi_dh_rdac scsi_dh ext3 mbcache jbd ahci libahci libata megaraid_sas scsi_mod [last unloaded: ipmi_si]
[ 2656.107295] Supported: Yes
[ 2656.107296]
[ 2656.107298] Pid: 2154, comm: kworker/0:3 Tainted: G X 3.0.42-0.7-default #1 Dell Inc. PowerEdge R720/0VWT90
[ 2656.107301] RIP: 0010:[] [] _raw_spin_unlock_irqrestore+0x8/0x10
[ 2656.107310] RSP: 0018:ffff8807f65e3df8 EFLAGS: 00000202
[ 2656.107312] RAX: ffff881000375000 RBX: 000000000000674f RCX: 000000000000c76b
[ 2656.107313] RDX: 000000000000c76b RSI: 0000000000000202 RDI: 0000000000000202
[ 2656.107315] RBP: ffff8807f7d22060 R08: ffffc900060e6000 R09: 0000000000000594
[ 2656.107317] R10: 0000000000001654 R11: 00000000fffffffc R12: ffffffff8144b66e
[ 2656.107319] R13: 0000000000000594 R14: 0000000000001654 R15: 00000000fffffffc
[ 2656.107321] FS: 0000000000000000(0000) GS:ffff88082fc00000(0000) knlGS:0000000000000000
[ 2656.107323] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2656.107324] CR2: 00007f1cea4af214 CR3: 00000007fd5c9000 CR4: 00000000000406f0
[ 2656.107326] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2656.107328] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2656.107330] Process kworker/0:3 (pid: 2154, threadinfo ffff8807f65e2000, task ffff8807fdc560c0)
[ 2656.107331] Stack:
[ 2656.107335] ffffffffa000f46d ffff8807f7d223c8 0000000000000000 ffff8807f7d223c8
.
.
.
.

i tried with or without transparent_hugepage=never with no effekt.

I found the Problem:

the following Patch must be applied to the kernel.

https://patchwork.kernel.org/patch/1549081/

currently every storage path failure will result in a kernel panic.
is there a chance, that this patch will make it to the next update?

Kind Regards,

Christian Bronk

Hi
I’ve asked my SUSE contacts about this and am waiting for the info on
what to do next.


Cheers Malcolm °¿° (Linux Counter #276890)
openSUSE 12.2 (x86_64) Kernel 3.4.11-2.16-desktop
up 1 day 15:50, 4 users, load average: 0.06, 0.07, 0.05
CPU Intel i5 CPU M520@2.40GHz | Intel Arrandale GPU