BTRFS: superblock checksum mismatch

I had my first major failure of the SLES on my RPi yesterday.

The system was running fine for a couple of months, except for some annoying kernel messages that would appear regularly (especially during compiles, updates or anything else producing lots of file I/O), generally along the lines of ‘swapper/0: page allocation failure: order:3, mode:0x2284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK)’

Anyway, I performed a ‘yum update’ which upgraded me to a new kernel (and lots of other stuff), which seemed to go fine. I had performed several reboots since the update, and not seen any new occurrences of kernel message…

Then on my last reboot, which was a normal controlled ‘reboot’ command issued via the GUI, the system failed to restart, with the error indicated in the subject line. I don’t know if this was related to the new software, or perhaps just the amount of flash I/O involved in doing it, or perhaps the was no connection and it was just a co-incidence.

After ‘BTRFS: superblock checksum mismatch’ the system reports:
[INDENT]
BRTFS: open_ctree failed
failed to mount /sysroot
see systemctl status sysroot.mount for details
dependency failed for Initrd Root File system
Dependency failed for reload configuration from the real root
stopped dracut pre-pivot and cleanup hook
stopped target initrd default target
stopped target initrd file systems
stopped dracut mount hook
stopped target basic system
stopped target system initialization
starting Emergency Shell…
generating “run/initramfs/rdsosreport.txt”

Entering emergency mode. exit shell to continue
type journalctl to view system logs
you might want to save “/run/initramfs/rdsosreport.txt” to a USB stick or /boot
after mounting them and attach it to a bug report

Recovery of btrfs file systems is not automated. we suggest you use
‘brtfs check --readonly’ first to see if there is any damage and
whats the scope. Logging the output is recomended for later analysis.

The option ‘–repair’ must be used with care., be noted that it is
able to fix certain classes of errors but not all of them.
:/#
[/INDENT]

Sadly, I wasn’t able to take any of the advice offered in those messages, because the shell did not seem be accepting any input. The keyboard was totally dead. It was working fine during the initial part of the boot process, allowing me to select between the two installed kernels and safe mode for each, but once the failure point was reached, I had no ability to enter any commands. The serial console was similarly dead. (the keyboard is a bog standard USB Dell product)

I am now in the process of going through a fresh install on a new uSD card to give me a system from which to attempt to diagnose and hopefully fix the problem with the non-booting system.

Anyone seen this sort of failure before? Either the damaged brtfs or the inability to enter commands after a boot failure?

Thanks,
DigbyT

Having a new SUSE install has revealed some very odd behavior…
It initially gave the impression that I had successfully mounted the old root filesystem, but the contents turned out to be those of the new root…

New system looks like:
digbyt@SuSEPi2:~> lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
mmcblk0 179:0 0 58.3G 0 disk
├─mmcblk0p1 179:1 0 200M 0 part /boot/efi
├─mmcblk0p2 179:2 0 266M 0 part /boot
├─mmcblk0p3 179:3 0 57.3G 0 part /
└─mmcblk0p4 179:4 0 493.3M 0 part [SWAP]
digbyt@SuSEPi2:~> df /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mmcblk0p3 60108892 2206068 56497228 4% /
digbyt@SuSEPi2:~>

Then I insert a USB stick with the errant uSD card installed:
digbyt@SuSEPi2:~> lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 14.5G 0 disk
├─sda1 8:1 1 200M 0 part
├─sda2 8:2 1 266M 0 part
├─sda3 8:3 1 13.5G 0 part /
└─sda4 8:4 1 494.1M 0 part
mmcblk0 179:0 0 58.3G 0 disk
├─mmcblk0p1 179:1 0 200M 0 part /boot/efi
├─mmcblk0p2 179:2 0 266M 0 part /boot
├─mmcblk0p3 179:3 0 57.3G 0 part
└─mmcblk0p4 179:4 0 493.3M 0 part [SWAP]
digbyt@SuSEPi2:~> df /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda3 60108892 2206068 56497228 4% /
digbyt@SuSEPi2:~>

Note how immediately after inserting the USB stick, df starts telling me that the root filesystem comes from it.
If I mount /dev/sda3 on /mnt, it seems to be mounting the new /dev/mmcblk0p3.

Running btrfs check --readonly gives:
SuSEPi2:/home/digbyt # btrfs check --readonly /dev/sda3
ERROR: superblock checksum mismatch
/dev/sda3 is currently mounted. Aborting.

Which seems to indicate that the apparent confusion as to which partition holds which filesystem is causing btrfs to think the filesystem is mounted when it isn’t…

I tried repeating the check on a different system (not RPi or SUSE) and got:
[root@uno ~]# btrfs check --readonly /dev/sdb3
ERROR: superblock checksum mismatch
ERROR: superblock checksum mismatch
No valid Btrfs found on /dev/sdb3
Couldn’t open file system

That looked disheartening, but I looked through the documentation on brtfs-check and tried ‘-s 1’ which gave a more promising:
[root@uno ~]# btrfs check --readonly -s 1 /dev/sdb3
using SB copy 1, bytenr 67108864
ERROR: superblock checksum mismatch
Checking filesystem on /dev/sdb3
UUID: 0b2c007e-1581-42e2-a05c-4a66eb4c115d
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 6819106834 bytes used err is 0
total csum bytes: 6451088
total tree bytes: 203522048
total fs tree bytes: 181485568
total extent tree bytes: 12484608
btree space waste bytes: 35735563
file data blocks allocated: 11768950784
referenced 9284182016

So I am now in the process of pulling what I can from the filesystem using:
btrfs restore -u 1 /dev/sdb3 out

I am inclined to put the initial corruption down to uSD card wearing out under the update I/O activity, but the inability to do anything at the failed boot prompt, and the filesystem confusion of a second, freshly installed SLES system seem to suggest some system issues…

Regards,
DigbyT

One last reply to my own thread - in case this helps anyone else…

The restore mentioned before seemed to complete successfully, with only three complaints along the lines of:
We seem to be looping a lot on out/@/…, do you want to keep going on ? (y/N/a):
The three files were all quite large datafiles which I knew had not been touched since the last backup, so I accepted the N option…

Confident that I had all recently modified files copied, I tried btrfs rescue as follows:

brtfs rescue super-recover -v /dev/sdb3

ERROR: superblock checksum mismatch
ERROR: superblock checksum mismatch
All Devices:
Device: id = 1, name = /dev/sdb3

Before Recovering:
[All good supers]:
device name = /dev/sdb3
superblock bytenr = 67108864

    [All bad supers]:
            device name = /dev/sdb3
            superblock bytenr = 65536

Make sure this is a btrfs disk otherwise the tool will destroy other fs, Are you sure? [y/N]: y
Recovered bad superblocks successful

I then ran the check again:
[root@uno susepi]# btrfs check --readonly /dev/sdb3
Checking filesystem on /dev/sdb3
UUID: 0b2c007e-1581-42e2-a05c-4a66eb4c115d
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 6819106834 bytes used err is 0
total csum bytes: 6451088
total tree bytes: 203522048
total fs tree bytes: 181485568
total extent tree bytes: 12484608
btree space waste bytes: 35735563
file data blocks allocated: 11768950784
referenced 9284182016

Which looks pretty good.

Re-inserted SD card into a RPi, and I have the original system back!