112 lines
4.1 KiB
Diff
112 lines
4.1 KiB
Diff
From: Dave Chinner <dchinner@redhat.com>
|
|
Date: Thu, 20 Dec 2018 23:23:24 +1100
|
|
Subject: iomap: Revert "fs/iomap.c: get/put the page in
|
|
iomap_page_create/release()"
|
|
Origin: https://git.kernel.org/linus/a837eca2412051628c0529768c9bc4f3580b040e
|
|
|
|
This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.
|
|
|
|
The reverted commit added page reference counting to iomap page
|
|
structures that are used to track block size < page size state. This
|
|
was supposed to align the code with page migration page accounting
|
|
assumptions, but what it has done instead is break XFS filesystems.
|
|
Every fstests run I've done on sub-page block size XFS filesystems
|
|
has since picking up this commit 2 days ago has failed with bad page
|
|
state errors such as:
|
|
|
|
# ./run_check.sh "-m rmapbt=1,reflink=1 -i sparse=1 -b size=1k" "generic/038"
|
|
....
|
|
SECTION -- xfs
|
|
FSTYP -- xfs (debug)
|
|
PLATFORM -- Linux/x86_64 test1 4.20.0-rc6-dgc+
|
|
MKFS_OPTIONS -- -f -m rmapbt=1,reflink=1 -i sparse=1 -b size=1k /dev/sdc
|
|
MOUNT_OPTIONS -- /dev/sdc /mnt/scratch
|
|
|
|
generic/038 454s ...
|
|
run fstests generic/038 at 2018-12-20 18:43:05
|
|
XFS (sdc): Unmounting Filesystem
|
|
XFS (sdc): Mounting V5 Filesystem
|
|
XFS (sdc): Ending clean mount
|
|
BUG: Bad page state in process kswapd0 pfn:3a7fa
|
|
page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1
|
|
flags: 0xfffffc0000000()
|
|
raw: 000fffffc0000000 dead000000000100 dead000000000200 ffff88800d9b6360
|
|
raw: 0000000000000001 0000000000000000 00000000ffffffff
|
|
page dumped because: non-NULL mapping
|
|
CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915
|
|
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
|
|
Call Trace:
|
|
dump_stack+0x67/0x90
|
|
bad_page.cold.116+0x8a/0xbd
|
|
free_pcppages_bulk+0x4bf/0x6a0
|
|
free_unref_page_list+0x10f/0x1f0
|
|
shrink_page_list+0x49d/0xf50
|
|
shrink_inactive_list+0x19d/0x3b0
|
|
shrink_node_memcg.constprop.77+0x398/0x690
|
|
? shrink_slab.constprop.81+0x278/0x3f0
|
|
shrink_node+0x7a/0x2f0
|
|
kswapd+0x34b/0x6d0
|
|
? node_reclaim+0x240/0x240
|
|
kthread+0x11f/0x140
|
|
? __kthread_bind_mask+0x60/0x60
|
|
ret_from_fork+0x24/0x30
|
|
Disabling lock debugging due to kernel taint
|
|
....
|
|
|
|
The failures are from anyway that frees pages and empties the
|
|
per-cpu page magazines, so it's not a predictable failure or an easy
|
|
to debug failure.
|
|
|
|
generic/038 is a reliable reproducer of this problem - it has a 9 in
|
|
10 failure rate on one of my test machines. Failure on other
|
|
machines have been at random points in fstests runs but every run
|
|
has ended up tripping this problem. Hence generic/038 was used to
|
|
bisect the failure because it was the most reliable failure.
|
|
|
|
It is too close to the 4.20 release (not to mention holidays) to
|
|
try to diagnose, fix and test the underlying cause of the problem,
|
|
so reverting the commit is the only option we have right now. The
|
|
revert has been tested against a current tot 4.20-rc7+ kernel across
|
|
multiple machines running sub-page block size XFs filesystems and
|
|
none of the bad page state failures have been seen.
|
|
|
|
Signed-off-by: Dave Chinner <dchinner@redhat.com>
|
|
Cc: Piotr Jaroszynski <pjaroszynski@nvidia.com>
|
|
Cc: Christoph Hellwig <hch@lst.de>
|
|
Cc: William Kucharski <william.kucharski@oracle.com>
|
|
Cc: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Cc: Brian Foster <bfoster@redhat.com>
|
|
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
---
|
|
fs/iomap.c | 7 -------
|
|
1 file changed, 7 deletions(-)
|
|
|
|
diff --git a/fs/iomap.c b/fs/iomap.c
|
|
index 5bc172f3dfe8..d6bc98ae8d35 100644
|
|
--- a/fs/iomap.c
|
|
+++ b/fs/iomap.c
|
|
@@ -116,12 +116,6 @@ iomap_page_create(struct inode *inode, struct page *page)
|
|
atomic_set(&iop->read_count, 0);
|
|
atomic_set(&iop->write_count, 0);
|
|
bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
|
|
-
|
|
- /*
|
|
- * migrate_page_move_mapping() assumes that pages with private data have
|
|
- * their count elevated by 1.
|
|
- */
|
|
- get_page(page);
|
|
set_page_private(page, (unsigned long)iop);
|
|
SetPagePrivate(page);
|
|
return iop;
|
|
@@ -138,7 +132,6 @@ iomap_page_release(struct page *page)
|
|
WARN_ON_ONCE(atomic_read(&iop->write_count));
|
|
ClearPagePrivate(page);
|
|
set_page_private(page, 0);
|
|
- put_page(page);
|
|
kfree(iop);
|
|
}
|
|
|
|
--
|
|
2.20.1
|
|
|