SLES as a guest os on KVM occurs ext3-fs errors

Hi:
I met some filesysterm errors in a sles guest on KVM. My system environment is:
HOST:
suse 10, the kernel version is 2.6.32.43
Qemu-KVM 1.2
Libvirt 1.0
guest OS:
suse 10, the kernel version is 2.6.32.43
VMs use a qcow2 disk.

Description of problem:
I have 20 VMs with qcow2 disk, these VMs have been forced to shut down by “virsh destroy” many times during VM installation.
When these vm reboot,dmesg show a ext3-fs mount error occurred on /usr/local partion /dev/vda3:
EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
and when I wrote into partion /dev/vda3,some errors occurred in dmesg:
1.error (device vda3): ext3_free_blocks: Freeing blocks not in datazone - block = 1869619311, count = 1
error (device vda3): ext3_free_blocks_sb: bit already cleared for block 2178152
error (device vda3): ext3_readdir: bad entry in directory #1083501:
2.[347470.661893] attempt to access beyond end of device
[347470.661896] vda3: rw=0, want=6870892952, limit=41945715
[347470.661897] EXT3-fs error (device vda3): ext3_free_branches: Read failure, inode=1083508, block=858861618
3.EXT3-fs error (device vda3): ext3_new_block: block(4295028581) >= blocks count(-1) - block_group = 1, es == ffff88021b6c7400

I suspect this fs-error is caused by multiple forced shutdown, but I can’t reproduce this bug now.
Could anyone has an idea or suggestion to help me?

Thanks in Advance

Regards
Ben

Hi Ben,

[QUOTE=benjamin_zb;17665]Hi:
I met some filesysterm errors in a sles guest on KVM. My system environment is:
HOST:
suse 10, the kernel version is 2.6.32.43
Qemu-KVM 1.2
Libvirt 1.0
guest OS:
suse 10, the kernel version is 2.6.32.43
VMs use a qcow2 disk.

Description of problem:
I have 20 VMs with qcow2 disk, these VMs have been forced to shut down by “virsh destroy” many times during VM installation.
When these vm reboot,dmesg show a ext3-fs mount error occurred on /usr/local partion /dev/vda3:
EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
and when I wrote into partion /dev/vda3,some errors occurred in dmesg:
1.error (device vda3): ext3_free_blocks: Freeing blocks not in datazone - block = 1869619311, count = 1
error (device vda3): ext3_free_blocks_sb: bit already cleared for block 2178152
error (device vda3): ext3_readdir: bad entry in directory #1083501:
2.[347470.661893] attempt to access beyond end of device
[347470.661896] vda3: rw=0, want=6870892952, limit=41945715
[347470.661897] EXT3-fs error (device vda3): ext3_free_branches: Read failure, inode=1083508, block=858861618
3.EXT3-fs error (device vda3): ext3_new_block: block(4295028581) >= blocks count(-1) - block_group = 1, es == ffff88021b6c7400

I suspect this fs-error is caused by multiple forced shutdown, but I can’t reproduce this bug now.
Could anyone has an idea or suggestion to help me?

Thanks in Advance

Regards
Ben[/QUOTE]

I’m not sure what you’re actually after - it usually is a bad idea to force any Linux system down by power-off (which in case of a VM is a “forced shutdown”). The system has no chance to write all FS buffers to disk, so if there are uncommitted modifications to the FS, it can become corrupt. Writing to the unfixed FS can make things worse, of course.

Once a file system is reported as potentially corrupted, run “fsck” - you didn’t mention if you followed that advice (“EXT3-fs warning: mounting fs with errors, running e2fsck is recommended”).

Do you really have to force down the VM? Any way to come around that would be an improvement.

Have you included /usr/local in your /etc/fstab? I’m surprised no automatic fsck was run. (Please note that even “fsck” is not guaranteed to actually fix all errors in the FS! Not writing all data to the FS before shutting down may lead to real data loss.)

Could anyone has an idea or suggestion to help me?

What type of suggestions are you looking for?

Regards,
Jens

Hi Jens,
[COLOR=“#0000FF”][COLOR=“#0000FF”]>

Once a file system is reported as potentially corrupted, run “fsck” - you didn’t mention if you followed that advice (“EXT3-fs warning: mounting fs with errors, running e2fsck is recommended”).[/COLOR][/COLOR]
when I meet this bug,I follow some steps:
 1.poweroff the vm in guest os (not virsh destroy)
 2.qemu-nbd -c /dev/nbd0 xxxx.img (in host)
 3.fsck -n /dev/nbd0p3 (nbd0p3 is partion /dev/vda3 in vm)
 4.if fsck return errors, I will run “fsck -y /dev/nbd0p3”
 5.after above steps, I restart the vm, and the corrupted /dev/nbd0p3 will be recovered.

[COLOR=“#0000FF”]Do you really have to force down the VM? Any way to come around that would be an improvement.[/COLOR]
The virsh doesn’t support “virsh shutdown xxx” because the guest can’t load acpi module,so I have to use “virsh destroy” on host.

[COLOR=“#0000FF”]>Have you included /usr/local in your /etc/fstab? I’m surprised no automatic fsck was run. (Please note that even “fsck” is not guaranteed to actually fix all errors in the FS! Not writing all data to the FS before shutting down may lead to real data loss.)[/COLOR]
The partion /usr/local have included in my /etc/fstab , but it is not automatic checked on reboot.
/etc/fstab:
/dev/vda3 /usr/local ext3 noatime,acl,user_xattr 0 0

[COLOR=“#0000FF”]>What type of suggestions are you looking for?[/COLOR]
I have tested forcing down 3 VMs many times(1000+) by “virsh destroy”, but I can’t reproduce the bugs.
Now, this bug can be recovered by fsck, but what the real reason to these fs errors?
So I wanna know if someone else encountered the same bugs(or fs errors) with me, and what the reasons leads to these fs errors, is the qemu or KVM or just my “rough” cmd.

Regards,
Ben

Hi Ben,

[QUOTE=benjamin_zb;17723]Hi Jens,
[COLOR=#0000FF][COLOR=#0000FF]>

Once a file system is reported as potentially corrupted, run “fsck” - you didn’t mention if you followed that advice (“EXT3-fs warning: mounting fs with errors, running e2fsck is recommended”).[/COLOR][/COLOR]
when I meet this bug,I follow some steps:
 1.poweroff the vm in guest os (not virsh destroy)
 2.qemu-nbd -c /dev/nbd0 xxxx.img (in host)
 3.fsck -n /dev/nbd0p3 (nbd0p3 is partion /dev/vda3 in vm)
 4.if fsck return errors, I will run “fsck -y /dev/nbd0p3”
 5.after above steps, I restart the vm, and the corrupted /dev/nbd0p3 will be recovered.

[COLOR=#0000FF]Do you really have to force down the VM? Any way to come around that would be an improvement.[/COLOR]
The virsh doesn’t support “virsh shutdown xxx” because the guest can’t load acpi module,so I have to use “virsh destroy” on host.

[COLOR=#0000FF]>Have you included /usr/local in your /etc/fstab? I’m surprised no automatic fsck was run. (Please note that even “fsck” is not guaranteed to actually fix all errors in the FS! Not writing all data to the FS before shutting down may lead to real data loss.)[/COLOR]
The partion /usr/local have included in my /etc/fstab , but it is not automatic checked on reboot.
/etc/fstab:
/dev/vda3 /usr/local ext3 noatime,acl,user_xattr 0 0

[COLOR=#0000FF]>What type of suggestions are you looking for?[/COLOR]
I have tested forcing down 3 VMs many times(1000+) by “virsh destroy”, but I can’t reproduce the bugs.
Now, this bug can be recovered by fsck, but what the real reason to these fs errors?
So I wanna know if someone else encountered the same bugs(or fs errors) with me, and what the reasons leads to these fs errors, is the qemu or KVM or just my “rough” cmd.

Regards,
Ben[/QUOTE]

I begin to understand… the behavior you’re seeing is not a bug (nor a feature :wink: ), but a severe “operator error”.

Some background information: You’re dealing with file systems that do not immediately persist all information, but keep parts of the changes “in memory” (“caching”). If you then simply power down the system, the parts that are not yet persisted to disk are lost. There’s a lot that can be said about how things work and what you can do to tune the behavior (how much “dirty” (non-persisted) information is allowed, how long may the system wait until if starts persisting dirty cache entries and so on) and this depends on the file system used, but the general idea is that this was introduced to avoid the rather slow “writes to disk” in scenarios where either the same information is changed rapidly or writes to disk can be done in successive blocks, where the disk can reach optimum throughput.

So if your file systems are affected by this behavior depends on

  • the amount of changes you made to the file system
  • the time between the (last) change and the power-off (iirc per default changes are synced every 30 seconds)
  • the caching setup of the file system

Most of the time, if the power-off didn’t hit right in the middle of some excessive file system change, your data was already persisted but the the file system’s “properly unmounted” flag wasn’t set - causing the fsck run, which then will find no problematic situation. But if much was changed (in-cache) and then power-off strikes, both meta data (where in the FS is what) and real content may not have been persisted yet and will be missing (and may even lead to file system states even fsck has trouble to correct).

or just my “rough” cmd

Just that - operator error.

The virsh doesn’t support “virsh shutdown xxx” because the guest can’t load acpi module,so I have to use “virsh destroy” on host.

How about some ssh mechanism, i.e. secured by certificates? Then you might be able to invoke “shutdown” inside the guest remotely.

/dev/vda3 /usr/local ext3 noatime,acl,user_xattr 0 0

From “man 5 fstab”:

[QUOTE]The sixth field (fs_passno).
This field is used by the fsck(8) program to determine the order in which filesystem checks are done at reboot time. The root filesystem should be specified with a fs_passno of 1, and other filesystems should have a fs_passno of 2. Filesystems
within a drive will be checked sequentially, but filesystems on different drives will be checked at the same time to utilize parallelism available in the hardware. If the sixth field is not present or zero, a value of zero is returned and fsck will
assume that the filesystem does not need to be checked.[/QUOTE]

Regards,
Jens