I/O errors reported in messages log

Hi Guys,

We are using SuSE 11 Sp2.
Following the way its configured.

The SuSE server in question with the errors is a Sentinel server.
This SuSE server is running on top of Vmware ESX server.
The ESX server is using storage from one of our SAN boxes.

Recently we started to see the error messages pertaining to, unable to use buffer and then I/O Error with respect to the partition [/var/opt/novell] or dm-12.
This is a 2 TB partition with 1.4 TB being already consumed.

Now the real question here is that, is this error being reported from the SuSE or is these an issue with the partition or could this be an issue with the SAN?
What would be my start point to deep dive into this issue? Since, messsage file is providing so lititle info about the error…
Is there is any way I could find out why this issue started?
Currently our focus is to find out why this error is occuring what might have caused it?

Any ideas?
Any pointers?

-ddgaikwad

Have you seen this from any other similar or dissimilar VMs? Can you
duplicate this without VMware or the SAN in the mix? Can you duplicate
this outside of writes from Sentinel?

Without any error messages here it’s easy to speculate but hard to do so
adding any real value. Since this is a SLES forum and I’m biased, I’ll
point my finger at anything else. :slight_smile:


Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below…

Hi ddgaikwad,

a good starting point would be to include samples of the actual error messages with your question, i.e. wrapped in [ CODE ] … [ / CODE ] tags (without the spaces) for better readability :wink:

(By “SuSE 11 Sp2” you most likely mean “SUSE Linux Enterprise Server 11”, SP2 → SLES11SP2… corrected just for the sake of search engines.)

Your SLES VM sees a storage device (and doesn’t actually know it’s virtual) and reports that there are I/O errors. This results from some error message (at the “device” layer, i.e. SCSI) reported by ESX to your VM.

A possible cause might be a corruption of the file system (inside the VM, on dm-12), causing references to disk sectors outside the limits of the defined virtual storage device.

It is more likely that ESX had some problems accessing the SAN storage. You have not given any details, so I cannot tell if it’s actually some SAN technology, or possibly accessed via NFS (albeit across a separate storage LAN). In either case, problems can occur (i.e. LAN/FC switch failures), resulting in the messages that brought you here.

Are these error messages still occurring? If not, it might have been a temporary situation, which limits your chances at spotting the root cause. If the error persists (across a reboot of the virtual server, including a file system check on dm-12), then looking at ESX should give you an idea of what’s going wrong.

Regards,
Jens