Skip to content

Commit 665575c

Browse files
hansendcakpm00
authored andcommitted
filemap: move prefaulting out of hot write path
There is a generic anti-pattern that shows up in the VFS and several filesystems where the hot write paths touch userspace twice when they could get away with doing it once. Dave Chinner suggested that they should all be fixed up[1]. I agree[2]. But, the series to do that fixup spans a bunch of filesystems and a lot of people. This patch fixes common code that absolutely everyone uses. It has measurable performance benefits[3]. I think this patch can go in and not be held up by the others. I will post them separately to their separate maintainers for consideration. But, honestly, I'm not going to lose any sleep if the maintainers don't pick those up. 1. https://lore.kernel.org/all/Z5f-x278Z3wTIugL@dread.disaster.area/ 2. https://lore.kernel.org/all/20250129181749.C229F6F3@davehans-spike.ostc.intel.com/ 3. https://lore.kernel.org/all/202502121529.d62a409e-lkp@intel.com/ This patch: There is a bit of a sordid history here. I originally wrote 998ef75 ("fs: do not prefault sys_write() user buffer pages") to fix a performance issue that showed up on early SMAP hardware. But that was reverted with 00a3d66 because it exposed an underlying filesystem bug. This is a reimplementation of the original commit along with some simplification and comment improvements. The basic problem is that the generic write path has two userspace accesses: one to prefault the write source buffer and then another to perform the actual write. On x86, this means an extra STAC/CLAC pair. These are relatively expensive instructions because they function as barriers. Keep the prefaulting behavior but move it into the slow path that gets run when the write did not make any progress. This avoids livelocks that can happen when the write's source and destination target the same folio. Contrary to the existing comments, the fault-in does not prevent deadlocks. That's accomplished by using an "atomic" usercopy that disables page faults. The end result is that the generic write fast path now touches userspace once instead of twice. 0day has shown some improvements on a couple of microbenchmarks: https://lore.kernel.org/all/202502121529.d62a409e-lkp@intel.com/ Link: https://lkml.kernel.org/r/20250228203722.CAEB63AC@davehans-spike.ostc.intel.com Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/yxyuijjfd6yknryji2q64j3keq2ygw6ca6fs5jwyolklzvo45s@4u63qqqyosy2/ Cc: Ted Ts'o <tytso@mit.edu> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
1 parent 654b33a commit 665575c

File tree

1 file changed

+16
-11
lines changed

1 file changed

+16
-11
lines changed

mm/filemap.c

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4169,17 +4169,6 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
41694169
bytes = min(chunk - offset, bytes);
41704170
balance_dirty_pages_ratelimited(mapping);
41714171

4172-
/*
4173-
* Bring in the user page that we will copy from _first_.
4174-
* Otherwise there's a nasty deadlock on copying from the
4175-
* same page as we're writing to, without it being marked
4176-
* up-to-date.
4177-
*/
4178-
if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) {
4179-
status = -EFAULT;
4180-
break;
4181-
}
4182-
41834172
if (fatal_signal_pending(current)) {
41844173
status = -EINTR;
41854174
break;
@@ -4197,6 +4186,12 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
41974186
if (mapping_writably_mapped(mapping))
41984187
flush_dcache_folio(folio);
41994188

4189+
/*
4190+
* Faults here on mmap()s can recurse into arbitrary
4191+
* filesystem code. Lots of locks are held that can
4192+
* deadlock. Use an atomic copy to avoid deadlocking
4193+
* in page fault handling.
4194+
*/
42004195
copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
42014196
flush_dcache_folio(folio);
42024197

@@ -4222,6 +4217,16 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
42224217
bytes = copied;
42234218
goto retry;
42244219
}
4220+
4221+
/*
4222+
* 'folio' is now unlocked and faults on it can be
4223+
* handled. Ensure forward progress by trying to
4224+
* fault it in now.
4225+
*/
4226+
if (fault_in_iov_iter_readable(i, bytes) == bytes) {
4227+
status = -EFAULT;
4228+
break;
4229+
}
42254230
} else {
42264231
pos += status;
42274232
written += status;

0 commit comments

Comments
 (0)