Kernel – 4.11-rc1 and BTRFS – a Warning..?

I encountered a catastrophic problem with btrfs formatted system disk partition corruption, shortly after updating a test system to Kernel 4.11-rc1.

To put this in context, this may be an isolated incident, and I have not seen problems – so far – mentioned elsewhere..    But.. there a quite a lot of BTRFS changes on 4.11, and – after the usual tests – I confirmed that system memory was OK (running memtest86+ several times..) and the disk was OK (running smartmontools utilities etc. several times).    The disk is only one year old, in any case (WD – 2GB)..

Unfortunately, I did not have a serial console attached to this particular system  at the time, and so had to photograph the screen at various times, and then re-type the details below from those photos (!)..

The scenario:

System disk (/dev/sda2) is formatted btrfs..
Updated to Kernel 4.11-rc1.
Within 24 hours, the system froze, and on rebooting, got a critical btrfs error:

...................
 Mounting /sysroot....
BTRFS critical (device sda2): corrupt node, bad key order: block=368346054656, root=1, slot=192
BTRFS critical (device sda2): corrupt node, bad key order: block=368346054656, root=1, slot=192
BTRFS error (device sda2): failed to read block groups: -5
BTRFS error (device sda2): open_ctree failed
[FAILED] Failed to mount /sysroot
See 'systemctl status sysroon.mount' for details
[DEPEND] Dependency failed for Initrd Root File System.
[DEPEND] Dependency failed for Reload Configuration from the Real Root.
......................
 Starting Emergency Shell...

Ran:

:/# btrfs check /dev/sda2
Checking filesystem on /dev/sda2
UUID xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxxxx
checking extents
bad block 33488896
Errors found in extent allocation tree or chunk allocation
^C
:/# btrfs rescue chunk-recover /dev/sda2
Scanning: 66126696448 in dev0

Then got scrolling screens full of:

...........
Deleting bad dir index [537997,96,160] root 5
Deleting bad dir index [537997,96,168] root 5
Deleting bad dir index [537997,96,170] root 5
Deleting bad dir index [537997,96,180] root 5
Deleting bad dir index [537997,96,186] root 5
Deleting bad dir index [537997,96,190] root 5
Deleting bad dir index [537997,96,198] root 5
................ continuing...

Then more scrolling screens full of:

...............
Trying to rebuild inode:1658592
root 5 inode 1658592 error 2001, no inode item, link count wrong
 unresolved ref dir 198531 index 0 namelen 18 name gtk-indent-ltr.png filetype 7 errors 6, no dir index, no inode ref
Trying to rebuild inode:1658593
root 5 inode 1658593 error 2001, no inode item, link count wrong
 unresolved ref dir 198531 index 0 namelen 18 name gtk-indent-rtl.png filetype 7 errors 6, no dir index, no inode ref
............... continuing....

Then more scrolling screens full of:

..............
repairing missing dir index item for inode 30734433
repairing missing dir index item for inode 30734434
repairing missing dir index item for inode 30734435
repairing missing dir index item for inode 30734436
repairing missing dir index item for inode 30734437
repairing missing dir index item for inode 30734438
repairing missing dir index item for inode 30734439
............ continuing...

Then more scrolling screens full of:

..............
Deleting bad dir index [363308,96,63443] root 5
Deleting bad dir index [363396,96,12025] root 5
The following tree block(s) is corrupted in tree 5:
 tree block bytenr: 44061900800, level: 1, node key: (365821249776, 168, 45056)
Try to repair the btree for root 5
Btree for root 5 is fixed
Deleting bad dir index [364225,96,3382] root 5
Deleting bad dir index [363308,96,63425] root 5
Deleting bad dir index [363308,96,63427] root 5
Deleting bad dir index [363308,96,63427] root 5
........... continuing ...........

The ‘recovery’ kept running, although appearing to stall, several times, but I just left it and it ran for almost three days in total, and finally ended:

..............
Btree for root 5 is fixed
root 5 root dir 256 error
root 5 inode 256 errors 200, dir isize wrong
reset isize for dir 2139040 root 5
reset isize for dir 2139040 root 5
..................
Trying to rebuild inode:33318043
moving file 'lost+found' to 'lost+found' dir since it has no valid backref
Fixed the nlink of inode 33318043
found 583909376 bytes used err is 1
total csum bytes: 0
total tree bytes: 1884160
total fs tree bytes: 0
total extent tree bytes: 1474560
btree space waste bytes: 758524
file data blocks allocated: 201064448
 referenced 201064448
:/#
:/#

After all that, I rebooted:

...................
 Mounting /sysroot....
BTRFS critical (device sda2): corrupt node, bad key order: block=44061900800, root=1, slot=192
BTRFS critical (device sda2): corrupt node, bad key order: block=44061900800, root=1, slot=192
BTRFS error (device sda2): failed to read block groups: -5
BTRFS error (device sda2): open_ctree failed
[FAILED] Failed to mount /sysroot
See 'systemctl status sysroon.mount' for details
[DEPEND] Dependency failed for Initrd Root File System.
[DEPEND] Dependency failed for Reload Configuration from the Real Root.
....................
Entering emergency mode..............

So… The btrfs recovery was a waste of three days, and did not fix the problem.. I will now have to re-create the data from another source.. Fortunately this was ‘just’ the system disk partition and no irreplaceable recent user data was lost, apart from /root

I should mention as well, that all other btrfs-formatted partitions on other systems, running 4.10.x and earlier, have been error-free, and reliable.   And… I have not tried 4.11-rc2 with BTRFS, yet..

Robert Gadsdon.   March 14, 2017.


Comments

Kernel – 4.11-rc1 and BTRFS – a Warning..? — 1 Comment

  1. After decades of no problems running Linux, except for the long ago xfs fs eating bug, I just had something very similar happen with ext4 on a nvme drive on Ubuntu 4.10.0-20, which likely contains 4.11 backported ‘fixes’. I suspect its probably not isolated to btrfs.

    BTW – Thanks for your VMware patches!

Leave a Reply

Your email address will not be published. Required fields are marked *