ext4: Fix checksum validation for inodes with small i_extra_isize
Closes: #838544, regression in 4.7.4
This commit is contained in:
parent
bdec0fe8f0
commit
7a0f81fe53
|
@ -139,6 +139,8 @@ linux (4.7.5-1) UNRELEASED; urgency=medium
|
|||
* [hppa] tracing: Re-enable FTRACE
|
||||
* [powerpc,powerpcspe,ppc64] linux-image: Suppress automatic dbgsym packages
|
||||
* uaccess,uio: Fix ABI changes in 4.7.5
|
||||
* ext4: Fix checksum validation for inodes with small i_extra_isize
|
||||
(Closes: #838544, regression in 4.7.4)
|
||||
|
||||
-- Ben Hutchings <ben@decadent.org.uk> Fri, 23 Sep 2016 00:50:40 +0100
|
||||
|
||||
|
|
|
@ -0,0 +1,218 @@
|
|||
From: "Darrick J. Wong" <darrick.wong@oracle.com>
|
||||
Date: Mon, 19 Sep 2016 22:52:16 -0700
|
||||
Subject: Re: Trouble mounting metadata_csum ext4 filesystems with v4.7.x after c9274d891869880648c4ee9365df3ecc7ba2e285: not enough inode bytes checksummed?
|
||||
Origin: https://www.spinics.net/lists/linux-fsdevel/msg101888.html
|
||||
Bug-Debian: https://bugs.debian.org/838544
|
||||
|
||||
[cc Ted and the ext4 list]
|
||||
|
||||
On Mon, Sep 19, 2016 at 03:19:03PM +0100, Nix wrote:
|
||||
> So I ran into spurious metadata corruption warnings in v4.7.2 due to the
|
||||
> problem fixed by c9274d8. I applied an early version of the fix,
|
||||
> rebooted, and oh dear root filesystem mount failure with invalid
|
||||
> checksum errors.
|
||||
>
|
||||
> The problem persists in v4.7.4, as seen here in qemu emulation on a raw
|
||||
> image dd'ed directly from the thing that won't boot, with a couple of
|
||||
> printk()s:
|
||||
>
|
||||
> # mount /dev/vda /new-root/
|
||||
> [ 8.124692] EXT4-fs (vda): couldn't mount as ext3 due to feature incompatibilities
|
||||
> [ 8.126977] EXT4-fs (vda): couldn't mount as ext2 due to feature incompatibilities
|
||||
> [ 9.017980] Inode size 256 > good old size 128; fits in inode: 0
|
||||
> [ 8.134897] inode 8: provided: 5c50l; calculated: 36e1i
|
||||
> [ 8.135098] EXT4-fs error (device vda): ext4_iget:4479: inode #8: comm mount: checksum invalid
|
||||
> [ 8.138992] EXT4-fs (vda): no journal found
|
||||
> [ 8.165744] UDF-fs: warning (device vda): udf_fill_super: No partition found (2)
|
||||
> mount: mounting /dev/vda on /new-root/ failed: Invalid argument
|
||||
>
|
||||
> I added a bit of printking to show the failure of the journal inode
|
||||
> checksum to pass muster. e2fsck (from e2fsprogs 1.43.1-14) is quite
|
||||
> happy with this filesystem. Reverting c9274d8 makes everything happy
|
||||
> again (well, it does bring the original bug back, which is a rather
|
||||
> serious one, but other than that...):
|
||||
>
|
||||
> [ 9.823032] EXT4-fs (vda): couldn't mount as ext3 due to feature incompatibilities
|
||||
> [ 9.824647] EXT4-fs (vda): couldn't mount as ext2 due to feature incompatibilities
|
||||
> [ 9.832593] inode 8: provided: 5c50l; calculated: 5c50i
|
||||
> [ 9.839253] inode 2: provided: d6ea92e9l; calculated: d6ea92e9i
|
||||
> [ 9.846947] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts: (null)
|
||||
>
|
||||
> So c9274d8 is clearly messing up the calculation of the checksum.
|
||||
>
|
||||
> The problem becomes more evident if we add more printk()s to the old
|
||||
> code, so we can see what region is being checksummed:
|
||||
>
|
||||
> # mount /dev/vda /new-root
|
||||
> [ 6.827297] inode 8: unadjusted csum of 256 bytes with seed a5df92a7: 449a5c50
|
||||
> [ 6.827596] adjusted csum: 5c50
|
||||
> [ 6.835993] inode 2: unadjusted csum of 256 bytes with seed 759c6c33: d6ea92e9
|
||||
> [ 6.836173] adjusted csum: d6ea92e9
|
||||
> [ 6.844801] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts:
|
||||
>
|
||||
> and the new:
|
||||
>
|
||||
> [ 11.098013] inode 8: csum of first 124 bytes with seed a5df92a7: f375b663
|
||||
> [ 11.098205] inode 8: added csum of 2 dummy_csum bytes with seed a5df92a7: 20cfebcb
|
||||
> [ 11.098420] inode 8: added csum of 2 bytes from offset 126 -- 128 to existing: d79e7432
|
||||
> [ 11.098646] inode 8: > GOOD_OLD_INODE_SIZE; added csum of 2 bytes from 128 -- 130 to existing: d10936e1
|
||||
> [ 11.098890] 8: adjusted csum: 36e1
|
||||
> [ 11.099133] EXT4-fs error (device vda): ext4_iget:4483: inode #8: comm mount: checksum invalid
|
||||
>
|
||||
> We are not checksumming enough bytes! We used to checksum the entire
|
||||
> 256-byte inode: now, we checksum only 130 bytes of it, which isn't even
|
||||
> enough to cover the 28-byte extra_isize on this filesystem and is more
|
||||
> or less guaranteed to give the wrong answer. I'd fix the problem, but I
|
||||
> frankly can't see how the new code is meant to be equivalent to the old
|
||||
> code in any sense -- most particularly what the stuff around dummy_csum
|
||||
> is meant to do -- so I thought it better to let the people who wrote it
|
||||
> fix it :)
|
||||
|
||||
Sh*t, I missed this during the review. So your filesystem image (thank
|
||||
you!) had this to say:
|
||||
|
||||
debugfs> stats
|
||||
Inode size: 256
|
||||
debugfs> stat <8>
|
||||
Size of extra inode fields: 0
|
||||
|
||||
Basically, a 128-byte inode inside a filesystem that allocated 256 bytes
|
||||
for each inode. As you point out, the old code would checksum the entire
|
||||
allocated space, whether or not the inode core used it. Obviously, you
|
||||
want this since inline extended attributes live in that space:
|
||||
|
||||
csum = ext4_chksum(sbi, ei->i_csum_seed, (__u8 *)raw,
|
||||
EXT4_INODE_SIZE(inode->i_sb));
|
||||
|
||||
The new code, on the other hand, carefully checksums around the
|
||||
i_checksum fields and only bothers to checksum the space between the end
|
||||
of i_checksum_hi and the end of the allocated space if the inode core is
|
||||
big enough to store i_checksum_hi. Since we allocated 256 bytes for
|
||||
each inode, we checksum the first two bytes after byte 128
|
||||
(EXT4_GOOD_OLD_INODE_SIZE), but then we see that i_extra_size == 0 so we
|
||||
never bother to checksum anything after that. This is of course wrong
|
||||
since we no longer checksum the xattr space and we've deviated from the
|
||||
pre-4.7.4 (documented) on-disk format.
|
||||
|
||||
*FORTUNATELY* since the root inode fails the read verifier, the file (in
|
||||
this case the root dir) can't be modified and therefore nothing has been
|
||||
corrupted. Especially fortunate for you, the fs won't mount, reducing
|
||||
the chances that any serious damage has occurred.
|
||||
|
||||
I /think/ the fix in this case is to hoist the last ext4_chksum call
|
||||
out of the EXT4_FITS_IN_INODE blob:
|
||||
|
||||
if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
|
||||
offset = offsetof(struct ext4_inode, i_checksum_hi);
|
||||
csum = ext4_chksum(sbi, csum, (__u8 *)raw +
|
||||
EXT4_GOOD_OLD_INODE_SIZE,
|
||||
offset - EXT4_GOOD_OLD_INODE_SIZE);
|
||||
if (EXT4_FITS_IN_INODE(raw, ei, i_checksum_hi)) {
|
||||
csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum,
|
||||
csum_size);
|
||||
offset += csum_size;
|
||||
}
|
||||
csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset,
|
||||
EXT4_INODE_SIZE(inode->i_sb) - offset);
|
||||
}
|
||||
|
||||
Can you give that a try?
|
||||
|
||||
> tune2fs output for this filesystem, particularly the extra_isize and
|
||||
> inode size fields are likely relevant:
|
||||
>
|
||||
> tune2fs 1.43.1 (08-Jun-2016)
|
||||
> Filesystem volume name: root
|
||||
> Last mounted on: /
|
||||
> Filesystem UUID: 6c0f7fa7-d6c2-4054-bff3-3a878460bdc7
|
||||
> Filesystem magic number: 0xEF53
|
||||
> Filesystem revision #: 1 (dynamic)
|
||||
> Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
|
||||
> Filesystem flags: signed_directory_hash
|
||||
> Default mount options: (none)
|
||||
> Filesystem state: clean
|
||||
> Errors behavior: Continue
|
||||
> Filesystem OS type: Linux
|
||||
> Inode count: 65536
|
||||
> Block count: 262144
|
||||
> Reserved block count: 13107
|
||||
> Free blocks: 227009
|
||||
> Free inodes: 59499
|
||||
> First block: 0
|
||||
> Block size: 4096
|
||||
> Fragment size: 4096
|
||||
> Group descriptor size: 64
|
||||
> Reserved GDT blocks: 63
|
||||
> Blocks per group: 32768
|
||||
> Fragments per group: 32768
|
||||
> Inodes per group: 8192
|
||||
> Inode blocks per group: 512
|
||||
> RAID stripe width: 16
|
||||
> Flex block group size: 64
|
||||
> Filesystem created: Tue May 26 21:28:46 2009
|
||||
> Last mount time: Sun Sep 18 23:34:41 2016
|
||||
> Last write time: Mon Sep 19 13:51:59 2016
|
||||
> Mount count: 0
|
||||
> Maximum mount count: 36
|
||||
> Last checked: Mon Sep 19 13:51:59 2016
|
||||
> Check interval: 15552000 (6 months)
|
||||
> Next check after: Sat Mar 18 12:51:59 2017
|
||||
> Lifetime writes: 16 GB
|
||||
> Reserved blocks uid: 0 (user root)
|
||||
> Reserved blocks gid: 0 (group root)
|
||||
> First inode: 11
|
||||
> Inode size: 256
|
||||
> Required extra isize: 28
|
||||
> Desired extra isize: 28
|
||||
> Journal inode: 8
|
||||
> Default directory hash: half_md4
|
||||
> Directory Hash Seed: f1da2da0-057e-4ba0-a021-3d56db5b24ab
|
||||
> Journal backup: inode blocks
|
||||
> Checksum type: crc32c
|
||||
> Checksum: 0x92acf115
|
||||
>
|
||||
> This is an old, upgraded fs from before the days of checksums, but even
|
||||
> so I'm surprised that with a 256-byte inode and no xattrs in use,
|
||||
> EXT4_FITS_IN_INODE is false. Maybe the extra_isize isn't big enough?
|
||||
|
||||
It's zero, so yes. :)
|
||||
|
||||
> An lzipped e2image of the problematic filesystem is available from
|
||||
> <http://www.esperi.org.uk/~nix/temporary/csum-corruption.img.lz>:
|
||||
> I have verified that the problem recurs with this image.
|
||||
>
|
||||
> I can also replicate the problem in literally seconds if you need more
|
||||
> debugging output.
|
||||
>
|
||||
>
|
||||
> ... The mystery is why this isn't going wrong anywhere else: I have
|
||||
> metadata_csum working on every fs on a bunch of other ext4-using
|
||||
> systems, and indeed on every filesystem on this machine as well, as long
|
||||
> as c9274d8 is not applied. Many of them are similarly upgraded pre-csum
|
||||
> fses with the same inode size and extra_isize, but they work...
|
||||
|
||||
Hard to say, but this bug only affects inodes with 0 < i_extra_size <= 4
|
||||
i.e. 128 < inode-core-size <= 130. Maybe an old ext3 fs that was
|
||||
created with 256 bytes per inode but inode-core-size of 128?
|
||||
|
||||
Uck. Sorry about this mess.
|
||||
|
||||
--D
|
||||
|
||||
[bwh: Converted to a patch]
|
||||
---
|
||||
--- a/fs/ext4/inode.c
|
||||
+++ b/fs/ext4/inode.c
|
||||
@@ -71,10 +71,9 @@ static __u32 ext4_inode_csum(struct inod
|
||||
csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum,
|
||||
csum_size);
|
||||
offset += csum_size;
|
||||
- csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset,
|
||||
- EXT4_INODE_SIZE(inode->i_sb) -
|
||||
- offset);
|
||||
}
|
||||
+ csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset,
|
||||
+ EXT4_INODE_SIZE(inode->i_sb) - offset);
|
||||
}
|
||||
|
||||
return csum;
|
|
@ -74,6 +74,7 @@ bugfix/all/kbuild-use-nostdinc-in-compile-tests.patch
|
|||
bugfix/all/disable-some-marvell-phys.patch
|
||||
bugfix/all/fs-add-module_softdep-declarations-for-hard-coded-cr.patch
|
||||
bugfix/all/kbuild-do-not-use-hyphen-in-exported-variable-name.patch
|
||||
bugfix/all/ext4-fix-bug-838544.patch
|
||||
|
||||
# Miscellaneous features
|
||||
|
||||
|
|
Loading…
Reference in New Issue