Awful write latency with FiberChannel SAN?

I’m running SLES 11 SP3 with all the available updates.

I have two different fiber channel JDOB arrays with Intel 520 SSD’s attached to the server and the write latency/IOPS is awful. I’m using a QLE2464 Qlogic HBA.

So here the scenerio. The server is modestly spec’d with 8 cores and 24GB or ram. I’m testing with IOMeter. When writing to an disk in the array the write latency is fairly steady 2100ms. I can’t get above 30 IOPS. Even if I create a RAID 0 with 14 SSD’s, the write latency is still 2100ms and IOPS barely go above 60 IOPS. Testing to the local RAID5 via the onboard HP SmartArray yields about 40ms response and 400 IOPS. Which is what I’d expect from the 4 disk SAS RAID 5. Read IOPS to the Fiberchannel SSDdisk looks ok. I get latency in the 20ms range and IOPS in the 4K to 5k range. So the issue appears to be write only.

Now, just to confirm I didnt have a hardware issue, I installed Windows 2008 R2 in a dual boot and tested with the same IOMETER spec. With a single SSD I can get about 20ms latency and 3000 IOPS. Which is more like it should be. This is with a plain install of 2008 and new tweaks to the drivers etc. I didnt change any of the hardware. Reboot back to SLES, and its slow.

So now that I know it isnt a hardware issue here is what I tried:

  1. Disabled atime
  2. Tried several file systems EXT3 and XFS (made sure the partitions where aligned and did NOT use LVM)
  3. Changed the scheduler from CFS to NOOP
  4. Disabled write barriers
  5. Played with /proc/sys/vm/dirty_ratio and /proc/sys/vm/dirty_background_ratio as noted here: http://www.novell.com/support/kb/doc.php?id=7010287
  6. Update the drivers from qlogics website and the HBA card firmware ,
  7. HP DL360 bios and firmwares updated

nothing made any noticeable different.

I’m not seeing any errors in /var/logs or dmesg.

I’m stumped where to look next. Does anyone have any troubleshooting tips to track this down?

-M

mww wrote:
[color=blue]

I’m running SLES 11 SP3 with all the available updates.[/color]

Are you sure? I didn’t know there was an SP3.
[color=blue]

I have two different fiber channel JDOB arrays with Intel 520 SSD’s
attached to the server and the write latency/IOPS is awful. I’m
using a QLE2464 Qlogic HBA.
[/color]

I have no idea what the problem might be. I did come across this in the
SLES System Analysis and Tuning Guide which may be of interest:

13.3 I/O Barrier Tuning

Most file systems (XFS, ext3, ext4, reiserfs) send write barriers to
disk after fsync or during transaction commits. Write barriers enforce
proper ordering of writes, making volatile disk write caches safe to
use (at some performance penalty). If your disks are battery-backed in
one way or another, disabling barriers may safely improve performance.

Sending write barriers can be disabled using the barrier=0 mount option
(for ext3, ext4, and reiserfs), or using the nobarrier mount option
(for XFS).

WARNING
Disabling barriers when disks cannot guarantee caches are properly
written in case of power failure can lead to severe file system
corruption and data loss.


Kevin Boyle - Knowledge Partner
If you find this post helpful and are using the web interface,
show your appreciation and click on the star below…

SLES 11 SP2, sorry for the typo.

I’ve already disabled write barriers as noted in the original post. It did not help.

-M

Hi
Have you thought about a different application for benchmarking? Have a
look at this one;

http://www.cyberciti.biz/tips/linux-filesystem-benchmarking-with-iozone.html

I do have it packaged up for SLE/openSUSE on the Open Build Service (it
needs some tweaking for producing the graphs, need to add a better
helper script).

I get these results for my OCZ Agility3 60GB device (openSUSE 12.1);
http://paste.opensuse.org/05ab2a29

and these for a WD RE3 500GB device (SLED 11 SP2);
http://paste.opensuse.org/7d52cf9c


Cheers Malcolm °¿° (Linux Counter #276890)
openSUSE 12.1 (x86_64) Kernel 3.1.10-1.9-desktop
up 2 days 11:32, 4 users, load average: 0.00, 0.01, 0.05
CPU Intel i5 CPU M520@2.40GHz | Intel Arrandale GPU

First off, Flash-based SSDs are the wrong answer if the goal is low-latency writes, NAND flash write latencies are horrible.

This is exacerbated by the RAID-5 write penalty, SSD write latencies get multiplied because of RAID-5 read-modify-write behavior.

Also, because you are using FC-to-SATA JBOD, you don’t have a DRAM write-cache to mask these latencies.

Finally, if you are using IOmeter on Linux (which was a crappy port from Intel’s original Windows code), there is a known issue where IOmeter cannot queue multiple outstanding IO requests. (Note that all of these outrageous IOPS claims for SSDs only come true when you have a 32-deep IO queue at the individual disks – which never happens in real life). Even without the IOmeter/Linux, I doubt you could ever generate more than 1-2 deep queue at the individual SSDs.

If low latency writes are your goal, a good caching RAID controller sitting in front of regular spinning disks will do you better. In any case, get rid of the RAID-5, go RAID-0 or RAID-1/10 and hit the array with IOmeter running on Windows with outstanding IOs set to >8 per HDD.

Malcom, regarding:

"I get these results for my OCZ Agility3 60GB device (openSUSE 12.1);
http://paste.opensuse.org/05ab2a29

and these for a WD RE3 500GB device (SLED 11 SP2);
http://paste.opensuse.org/7d52cf9c


These results are showing MBytes/Sec. The problem discussed here is Latency. Two completely different things.