SLES 11.1 issues with mptspi - LSI 53C1030 scsi card

We are having issues with losing communication with the scsi card that
is attached to our Qualstar tape library. I am not sure where I should
go to try to resolve this problem. The hardware is a HP Proliant
DL380G6 and the card is a LSI Symbios Logic 53C1030. The mptspi version
is 4.22.00.00. Here are the error messages in dmesg regarding the
communication problem.

[ 876.333335] st0: Block limits 1 - 16777215 bytes.
[ 2660.514511] mptscsih: ioc0: attempting task abort!
(sc=ffff88007be00e80)
[ 2660.514515] st 0:0:1:0: CDB: Write(6): 0a 00 00 80 00 00
[ 2660.642305] mptscsih: ioc0: task abort: FAILED (rv=2003)
(sc=ffff88007be00e80) (sn=18575)
[ 2660.642312] mptscsih: ioc0: attempting target reset!
(sc=ffff88007be00e80)
[ 2660.642315] st 0:0:1:0: CDB: Write(6): 0a 00 00 80 00 00
[ 2661.020839] mptscsih: ioc0: target reset: SUCCESS
(sc=ffff88007be00e80)
[ 2661.039760] st0: Error 80000 (driver bt 0x0, host bt 0x8).

I have just applied the most current patches to the server and still no
luck. The problem started happening when I applied patches back on
September 1st. I’m not sure what I can do to determine the
kernel/config from before that day.

Any ideas??

Thanks!

Melissa


helgeson

helgeson’s Profile: http://forums.novell.com/member.php?userid=3839
View this thread: http://forums.novell.com/showthread.php?t=447420

On Thu, 27 Oct 2011 14:56:01 GMT
helgeson helgeson@no-mx.forums.novell.com wrote:
[color=blue]

We are having issues with losing communication with the scsi card that
is attached to our Qualstar tape library. I am not sure where I
should go to try to resolve this problem. The hardware is a HP
Proliant DL380G6 and the card is a LSI Symbios Logic 53C1030. The
mptspi version is 4.22.00.00. Here are the error messages in dmesg
regarding the communication problem.

[ 876.333335] st0: Block limits 1 - 16777215 bytes.
[ 2660.514511] mptscsih: ioc0: attempting task abort!
(sc=ffff88007be00e80)
[ 2660.514515] st 0:0:1:0: CDB: Write(6): 0a 00 00 80 00 00
[ 2660.642305] mptscsih: ioc0: task abort: FAILED (rv=2003)
(sc=ffff88007be00e80) (sn=18575)
[ 2660.642312] mptscsih: ioc0: attempting target reset!
(sc=ffff88007be00e80)
[ 2660.642315] st 0:0:1:0: CDB: Write(6): 0a 00 00 80 00 00
[ 2661.020839] mptscsih: ioc0: target reset: SUCCESS
(sc=ffff88007be00e80)
[ 2661.039760] st0: Error 80000 (driver bt 0x0, host bt 0x8).

I have just applied the most current patches to the server and still
no luck. The problem started happening when I applied patches back on
September 1st. I’m not sure what I can do to determine the
kernel/config from before that day.

Any ideas??

Thanks!

Melissa

[/color]
Hi
You can look back at what was installed and when in
the /var/log/zypp/history file.


Cheers Malcolm °¿° (Linux Counter #276890)
openSUSE 11.4 (x86_64) Kernel 2.6.37.6-0.7-desktop
up 3 days 0:55, 5 users, load average: 0.05, 0.07, 0.13
GPU GeForce 8600 GTS Silent - Driver Version: 285.05.09

I have the same problem, with an LSI U320 card (Dell LSI203201E) and a Dell TL2000 Powervault (w/ IBM 3580 drive)
As far as I know, it never worked.
The card detects both, and seems to operate the tape drive okay (cant’ be sure) but it definitely can’t communicate with the medium changer.

Oct 9 11:47:29 newctrp kernel: [81077.876044] mptbase: ioc0: LogInfo(0x11010400): F/W: bug! MID not found
Oct 9 11:47:30 newctrp kernel: [81078.400357] mptbase: ioc0: LogInfo(0x11010400): F/W: bug! MID not found
Oct 9 11:47:41 newctrp kernel: [81089.106260] mptscsih: ioc0: attempting task abort! (sc=ffff88028d1ffbc0)
Oct 9 11:47:41 newctrp kernel: [81089.106265] ch 3:0:4:1: [sg5] CDB: Read element status: b8 24 00 00 00 32 01 00 00 42 00 00
Oct 9 11:47:41 newctrp kernel: [81089.230119] mptscsih: ioc0: task abort: FAILED (rv=2003) (sc=ffff88028d1ffbc0) (sn=269)
Oct 9 11:47:41 newctrp kernel: [81089.230130] mptscsih: ioc0: attempting target reset! (sc=ffff88028d1ffbc0)
Oct 9 11:47:41 newctrp kernel: [81089.230134] ch 3:0:4:1: [sg5] CDB: Read element status: b8 24 00 00 00 32 01 00 00 42 00 00
Oct 9 11:47:41 newctrp kernel: [81089.607851] mptscsih: ioc0: target reset: SUCCESS (sc=ffff88028d1ffbc0)
Oct 9 11:59:25 newctrp kernel: [81791.896933] mptbase: ioc0: LogInfo(0x11010400): F/W: bug! MID not found
Oct 9 11:59:25 newctrp kernel: [81792.421250] mptbase: ioc0: LogInfo(0x11010400): F/W: bug! MID not found
Oct 9 12:02:51 newctrp kernel: [81997.846139] mptscsih: ioc0: attempting task abort! (sc=ffff8801faa37b80)
Oct 9 12:02:51 newctrp kernel: [81997.846145] ch 3:0:4:1: [sg5] CDB: Mode Sense(6): 1a 08 1d 00 18 00
Oct 9 12:02:51 newctrp kernel: [81997.973946] mptscsih: ioc0: task abort: FAILED (rv=2003) (sc=ffff8801faa37b80) (sn=298)
Oct 9 12:02:51 newctrp kernel: [81997.973958] mptscsih: ioc0: attempting target reset! (sc=ffff8801faa37b80)
Oct 9 12:02:51 newctrp kernel: [81997.973962] ch 3:0:4:1: [sg5] CDB: Mode Sense(6): 1a 08 1d 00 18 00

Hi lpphiggp,

[QUOTE=lpphiggp;16957]I have the same problem, with an LSI U320 card (Dell LSI203201E) and a Dell TL2000 Powervault (w/ IBM 3580 drive)
As far as I know, it never worked.
The card detects both, and seems to operate the tape drive okay (cant’ be sure) but it definitely can’t communicate with the medium changer.

Oct 9 11:47:29 newctrp kernel: [81077.876044] mptbase: ioc0: LogInfo(0x11010400): F/W: bug! MID not found
Oct 9 11:47:30 newctrp kernel: [81078.400357] mptbase: ioc0: LogInfo(0x11010400): F/W: bug! MID not found
[…][/QUOTE]

looks to me as if either the firmware has trouble communicating with the changer or with the driver. Have you already contacted the (server/card) vendor to verify that you’re running the latest supporting combination of those three (card, firmware, driver) or if there are known problems with the changer device in combination with that SCSI card/firmware?

Regards,
Jens

[QUOTE=jmozdzen;16986]Hi lpphiggp,

looks to me as if either the firmware has trouble communicating with the changer or with the driver. Have you already contacted the (server/card) vendor to verify that you’re running the latest supporting combination of those three (card, firmware, driver) or if there are known problems with the changer device in combination with that SCSI card/firmware?

Regards,
Jens[/QUOTE]

Hi Jens, Sort of… not LSI particularly, but Dell, who is who is bought through.
Novell/SuSE support (I finally opened a ticket) is suggesting an incompatibility.

Now, this card works just fine in Dell R-710s that are running SLES 10 and use the TL2000s, just not this one R-710 which is running SLES 11, which suggests SLES 11 is the issue - it’s the only variable that’s different.
We do, oddly, have an R-720 though running SLES 11 in which this card works fine, not sure why that is, but I’m not complaining!

SCSI is such a straight forward old-school protocol, and this card is by no means ancient, I just can’t believe it won’t work. That just seems like planned obsolescence for the sake of planned obsolescence.
We don’t have options, our budget is done for the year… no more purchases. I just migrated this server, if I can’t get it to work, 3 critical servers are going to go without backups.

I’m wondering if I can copy the SLES10 mptspi module over from a SLES 10 box and use that… not sure what dependent modules I’d need to grab too though.

Hi lpphigpg,

[QUOTE=lpphiggp;17010]Hi Jens, Sort of… not LSI particularly, but Dell, who is who is bought through.
Novell/SuSE support (I finally opened a ticket) is suggesting an incompatibility.[/QUOTE]

opening the ticket is a good idea - is it already closed or is SUSE still looking into it to see if they can help? If you’d share the SR number (via private message) I’ll try to take a look at the details.

“SLES11” in this context is especially relating to the kernel/driver that is used to access the SCSI card. Oh, and which SLES11 are we talking about, SP1 as the title suggests? SLES11 (“SP0”) and SP1 use 2.6 kernel, while SP2&SP3 use 3.0 kernels.

Same kernel/driver in SLES as with the non-working box? Is that SCSI card connected to the same library (same lib or at least the same type of lib)? You might want to look into firmware versions of the library, if one works and another one of the same model and make doesn’t.

The problem probably relates exactly to the fact that the card isn’t ancient - these cards carry an embedded system that operates as an independent computer system, initialized by firmware and handling all the bus traffic. The more recent the card, the more sophisticated theses controllers get and the less time manufactures have to spend on debugging their firmware and drivers - at least that’s what my experience tells me. Add a community-developed driver to that picture (relying on consistent and stable controller APIs across all “supported” card models and firmwares) and you’ll see why it sometimes needs only a minor card/firmware update to break things badly.

This is why I recommended contacting LSI and/or Dell in my first response - they do know which version of driver and (Linux-shipped) firmware works, sometimes vendors even provide their own, updated drivers for Linux Enterprise versions. As I see it, you bought all hardware components from a single vendor - you wrote you contacted them, but did not write what they said. Is your SLES11 version a supported OS on that hardware platform?

[QUOTE=lpphiggp;17010]We don’t have options, our budget is done for the year… no more purchases. I just migrated this server, if I can’t get it to work, 3 critical servers are going to go without backups.

I’m wondering if I can copy the SLES10 mptspi module over from a SLES 10 box and use that… not sure what dependent modules I’d need to grab too though.[/QUOTE]

From what my “crystal ball” tells me, this is a software-related problem. But as we’re talking about kernel drivers here, simply copying over ancient driver files (SLES10 drivers on a SLES11SP3 system) won’t work and will break things completely - these drivers are kernel-version-specific. My hope goes for a firmware/driver recommendation from Dell/LSI or some Dell support pack for your SLES11 version, shipping updated drivers/firmware optimized and tested by the vendor.

It all depends on the vendor stating that your SLES11 is supported on your hardware. Did you get any feedback to that question?

Regards,
Jens

[QUOTE=jmozdzen;17018]Hi lpphigpg,

opening the ticket is a good idea - is it already closed or is SUSE still looking into it to see if they can help? If you’d share the SR number (via private message) I’ll try to take a look at the details.[/QUOTE]
The ticket is open but the response was pretty much, “we see no evidence of compatibility, tough luck”… I’ll send the SR#, thanks!

[QUOTE=jmozdzen;17018]
“SLES11” in this context is especially relating to the kernel/driver that is used to access the SCSI card. Oh, and which SLES11 are we talking about, SP1 as the title suggests? SLES11 (“SP0”) and SP1 use 2.6 kernel, while SP2&SP3 use 3.0 kernels.

Same kernel/driver in SLES as with the non-working box? Is that SCSI card connected to the same library (same lib or at least the same type of lib)? You might want to look into firmware versions of the library, if one works and another one of the same model and make doesn’t.[/QUOTE]

The problem server is SLES11 sp2;
But here’s an update: as a test last night, I set up another Dell R-710 with the exact same type card, connected to another TL2000 (both TL2000s have the latest firmware, that was one thing Dell had me update) and installed SLES11sp2… and it works just fine.
These setups are identical in nearly every respect, except I had run updates on the problem server.
I looked at the modules under /lib/modules/3.0.93.-0.5-default/kernel/drivers/message/fusion, the dates on the problem box modules is 8/27/31, but on the working server just installed, they’re 2/15/2012.

Notably, the kernel is slightly older on the working server: 3.0.13-0.27-default. So yeah, something in the updated kernel but not from the Service Pack itself) seems to have broken the card.

[QUOTE=jmozdzen;17018]
The problem probably relates exactly to the fact that the card isn’t ancient - these cards carry an embedded system that operates as an independent computer system, initialized by firmware and handling all the bus traffic. The more recent the card, the more sophisticated theses controllers get and the less time manufactures have to spend on debugging their firmware and drivers - at least that’s what my experience tells me. Add a community-developed driver to that picture (relying on consistent and stable controller APIs across all “supported” card models and firmwares) and you’ll see why it sometimes needs only a minor card/firmware update to break things badly.

This is why I recommended contacting LSI and/or Dell in my first response - they do know which version of driver and (Linux-shipped) firmware works, sometimes vendors even provide their own, updated drivers for Linux Enterprise versions. As I see it, you bought all hardware components from a single vendor - you wrote you contacted them, but did not write what they said. Is your SLES11 version a supported OS on that hardware platform?[/QUOTE]

Thanks, that’s interesting, enlightening… and depressing. I bought the cards back in March of 2012. (I still have the PO) Dell didn’t find anything for an update.
They pointed me to LSI’s website, I guess contacting them is my next step.

[QUOTE=jmozdzen;17018]
From what my “crystal ball” tells me, this is a software-related problem. But as we’re talking about kernel drivers here, simply copying over ancient driver files (SLES10 drivers on a SLES11SP3 system) won’t work and will break things completely - these drivers are kernel-version-specific. My hope goes for a firmware/driver recommendation from Dell/LSI or some Dell support pack for your SLES11 version, shipping updated drivers/firmware optimized and tested by the vendor.

It all depends on the vendor stating that your SLES11 is supported on your hardware. Did you get any feedback to that question?

Regards,
Jens[/QUOTE]

Well, I know now I wouldn’t have to copy the modules from SLES10, perhaps un-updated SLES11sp2 modules would work but, there’s still the minor difference in kernel - 3.0.93.-0.5-default vs 3.0.13-0.27-default. Not sure if that’s a problem nor not.
If I tried it, how much damage could it do? It already doesn’t work… could I get a kernel panic that prevented me from booting the machine? (It shouldn’t affect the RAID array, that’s a megaraid SAS driver, altogether different.)
Which vendor? It’s kinda both Dell and LSI. Dell is stumped.
Contacting LSI is next option but they list no drives… they consider this thing legacy already, last I checked.
I can check for newer firmware for the card at least.

I’m worried the catch22 is probably going to be: too old to support, too new to work correctly.

Hi lpphiggp,

yeah, looking at Dell’s pages, I couldn’t find anything that made me believe that they would support SLES11 on that machine… the latest LSI card update I saw listed was for SLES9. (OTOH, my first conclusion was that it should “would out of the box” with newer versions!)

[QUOTE=lpphiggp;17024][…] installed SLES11sp2… and it works just fine.
These setups are identical in nearly every respect, except I had run updates on the problem server.
I looked at the modules under /lib/modules/3.0.93.-0.5-default/kernel/drivers/message/fusion, the dates on the problem box modules is 8/27/31, but on the working server just installed, they’re 2/15/2012.

Notably, the kernel is slightly older on the working server: 3.0.13-0.27-default. So yeah, something in the updated kernel but not from the Service Pack itself) seems to have broken the card.[/QUOTE]

Now that is an interesting piece of information, something that may even get your SR rolling: You see a regression - an update broke functionality. What you’ll need (and me too, in order to assist in case of questions) is the exact package names & versions on the working and non-working system… which is fortunately easy to get:

rpm -qf /lib/modules/3.0.93.-0.5-default/kernel/drivers/message/fusion/mptbase.ko

Of course, on the other system, use the kernel-specific directory name. And the other files in that directory ought to belong to the same RPM, which probably is “kernel-3.0.93-0.5-default” on the first system.

Second thing you may think about: “downgrade” the kernel on the non-working system - this can be done via zypper or via YaST.

I will describe the YaST2 route via a text console (the non-X11 version): you go into software management, search for the kernel package, switch to the upper right panel and scroll down 'til the package (which has an “i”, like “installed”, in front of it’s name) is highlighted, then chose “view” - “package versions” (press Alt-V and another Alt-V). In the lower right window, you should see the available package version alternatives - select the one you have installed on the working system (tab to that window, move to the wanted version and press “space” to select it). YaST2 will then try to determine all packages that need to be changed because they depend on the installed package version… and if you’re lucky, everything will be available and you can complete the package installation as usual.

You’d still need to watch out for updates - if you don’t “lock” that RPM, it’ll get overwritten during the next update cycle. But I wouldn’t go that far yet.

After installation and (hopefully successful) reboot (go for a power cylce, not just a simple restart), test your SCSI connection - if it really was the software, you should have a running system now. Next thing then is to follow up on that service request, tell them what you did (especially the exact packet and version you selected to downgrade to) and get helped :slight_smile:

If this didn’t help, you might also check the versions of the “kernel-firmware” RPM:

rpm -q kernel-firmware

Maybe we can get you up and running that easy - I hope for it!

Regards,
Jens

[QUOTE=jmozdzen;17028]Hi lpphiggp,

yeah, looking at Dell’s pages, I couldn’t find anything that made me believe that they would support SLES11 on that machine… the latest LSI card update I saw listed was for SLES9. (OTOH, my first conclusion was that it should “would out of the box” with newer versions!)[/QUOTE]

Actually I saw, on LSI’s site, drivers for SLES10 and RHEL5…

[QUOTE=jmozdzen;17028]
Now that is an interesting piece of information, something that may even get your SR rolling: You see a regression - an update broke functionality. What you’ll need (and me too, in order to assist in case of questions) is the exact package names & versions on the working and non-working system… which is fortunately easy to get:

rpm -qf /lib/modules/3.0.93.-0.5-default/kernel/drivers/message/fusion/mptbase.ko

Of course, on the other system, use the kernel-specific directory name. And the other files in that directory ought to belong to the same RPM, which probably is “kernel-3.0.93-0.5-default” on the first system.
[/quote/

Problem server:

:confused: # rpm -qf /lib/modules/3.0.93-0.5-default/kernel/drivers/message/fusion/mptbase.ko
kernel-default-3.0.93-0.5.1

Working Server:
:confused: # rpm -qf /lib/modules/3.0.13-0.27-default/kernel/drivers/message/fusion/mptbase.ko
kernel-default-3.0.13-0.27.1

[QUOTE=jmozdzen;17028]
Second thing you may think about: “downgrade” the kernel on the non-working system - this can be done via zypper or via YaST.

I will describe the YaST2 route via a text console (the non-X11 version): you go into software management, search for the kernel package, switch to the upper right panel and scroll down 'til the package (which has an “i”, like “installed”, in front of it’s name) is highlighted, then chose “view” - “package versions” (press Alt-V and another Alt-V). In the lower right window, you should see the available package version alternatives - select the one you have installed on the working system (tab to that window, move to the wanted version and press “space” to select it). YaST2 will then try to determine all packages that need to be changed because they depend on the installed package version… and if you’re lucky, everything will be available and you can complete the package installation as usual.

You’d still need to watch out for updates - if you don’t “lock” that RPM, it’ll get overwritten during the next update cycle. But I wouldn’t go that far yet.

After installation and (hopefully successful) reboot (go for a power cylce, not just a simple restart), test your SCSI connection - if it really was the software, you should have a running system now. Next thing then is to follow up on that service request, tell them what you did (especially the exact packet and version you selected to downgrade to) and get helped :slight_smile:

If this didn’t help, you might also check the versions of the “kernel-firmware” RPM:

rpm -q kernel-firmware

Maybe we can get you up and running that easy - I hope for it!

Regards,
Jens[/QUOTE]

I found it a lot easier to use the X version, yast2… A search on kernel-default-3.0.93-0.5.1 didn’t come up, but just typing in kernel-default worked.
Then I saw “kernel-default-base” which showed the current version 3.093-0.5.1; then in the bottom right window was a tab for Versions, and there I saw several, including the one on the working system (3.0.13-0.27.1) Ah hah!

I’m keeping my eye on that for now, but first, there’s also a change log tab, I’m going through that to see if they specifically mention anything about the fusion drivers.

Thanks Jen!

Hi lpphiggp,

I’m keeping my eye on that for now, but first, there’s also a change log tab, I’m going through that to see if they specifically mention anything about the fusion drivers.

another bit of information can be gotten via “modinfo mptbase.ko” (or any other module, of course) - it will list the “version” of the module, which is independent of the version of kernel it is compiled against. Comparing the two systems, you can speculate how close these two drivers are “apart”.

And you may of course try to look up the driver source (SP3 reports GPL source for this driver :slight_smile: ) and check for the according change logs as well.

Regards,
Jens

[QUOTE=jmozdzen;17049]Hi lpphiggp,

I’m keeping my eye on that for now, but first, there’s also a change log tab, I’m going through that to see if they specifically mention anything about the fusion drivers.

another bit of information can be gotten via “modinfo mptbase.ko” (or any other module, of course) - it will list the “version” of the module, which is independent of the version of kernel it is compiled against. Comparing the two systems, you can speculate how close these two drivers are “apart”.

And you may of course try to look up the driver source (SP3 reports GPL source for this driver :slight_smile: ) and check for the according change logs as well.

Regards,
Jens[/QUOTE]

Hi Jens,

Okay, that’s weird… Both test and problem server report
Version 4.28.00.00suse

So… no difference…? except on the working test server, typing mptbase.ko returns an error, while dropping the “.ko” makes it return info.
Oh, now they both do that. shrug

One good thing… it says it’s supported.
But I still haven’t heard back from the Novell tech once I told him it’s working on other systems.

[INDENT]ProblemServer:~ # modinfo mptbase.ko
ERROR: modinfo: could not open mptbase.ko: No such file or directory
newctrp:~ #
newctrp:~ # modinfo mptbase
filename: /lib/modules/3.0.93-0.5-default/kernel/drivers/message/fusion/mptbase.ko
version: 4.28.00.00suse
license: GPL
description: Fusion MPT base driver
author: LSI Corporation
srcversion: 52E0AD7917DF41317427A05
depends:
supported: yes
vermagic: 3.0.93-0.5-default SMP mod_unload modversions
parm: mpt_msi_enable_spi: Enable MSI Support for SPI controllers (default=0) (int)
parm: mpt_msi_enable_fc: Enable MSI Support for FC controllers (default=0) (int)
parm: mpt_msi_enable_sas: Enable MSI Support for SAS controllers (default=0) (int)
parm: mpt_channel_mapping: Mapping id’s to channels (default=0) (int)
parm: mpt_debug_level: debug level - refer to mptdebug.h - (default=0)
parm: mpt_fwfault_debug:Enable detection of Firmware fault and halt Firmware on fault - (default=0) (int)
ProblemServer:~ #[/INDENT]

Paul

whoops… so much for me successfully obscuring the name of the server… lol