Skip to content

Commit 7b08b74

Browse files
arkamarakpm00
authored andcommitted
mm: fix folio_pte_batch() on XEN PV
On XEN PV, folio_pte_batch() can incorrectly batch beyond the end of a folio due to a corner case in pte_advance_pfn(). Specifically, when the PFN following the folio maps to an invalidated MFN, expected_pte = pte_advance_pfn(expected_pte, nr); produces a pte_none(). If the actual next PTE in memory is also pte_none(), the pte_same() succeeds, if (!pte_same(pte, expected_pte)) break; the loop is not broken, and batching continues into unrelated memory. For example, with a 4-page folio, the PTE layout might look like this: [ 53.465673] [ T2552] folio_pte_batch: printing PTE values at addr=0x7f1ac9dc5000 [ 53.465674] [ T2552] PTE[453] = 000000010085c125 [ 53.465679] [ T2552] PTE[454] = 000000010085d125 [ 53.465682] [ T2552] PTE[455] = 000000010085e125 [ 53.465684] [ T2552] PTE[456] = 000000010085f125 [ 53.465686] [ T2552] PTE[457] = 0000000000000000 <-- not present [ 53.465689] [ T2552] PTE[458] = 0000000101da7125 pte_advance_pfn(PTE[456]) returns a pte_none() due to invalid PFN->MFN mapping. The next actual PTE (PTE[457]) is also pte_none(), so the loop continues and includes PTE[457] in the batch, resulting in 5 batched entries for a 4-page folio. This triggers the following warning: [ 53.465751] [ T2552] page: refcount:85 mapcount:20 mapping:ffff88813ff4f6a8 index:0x110 pfn:0x10085c [ 53.465754] [ T2552] head: order:2 mapcount:80 entire_mapcount:0 nr_pages_mapped:4 pincount:0 [ 53.465756] [ T2552] memcg:ffff888003573000 [ 53.465758] [ T2552] aops:0xffffffff8226fd20 ino:82467c dentry name(?):"libc.so.6" [ 53.465761] [ T2552] flags: 0x2000000000416c(referenced|uptodate|lru|active|private|head|node=0|zone=2) [ 53.465764] [ T2552] raw: 002000000000416c ffffea0004021f08 ffffea0004021908 ffff88813ff4f6a8 [ 53.465767] [ T2552] raw: 0000000000000110 ffff888133d8bd40 0000005500000013 ffff888003573000 [ 53.465768] [ T2552] head: 002000000000416c ffffea0004021f08 ffffea0004021908 ffff88813ff4f6a8 [ 53.465770] [ T2552] head: 0000000000000110 ffff888133d8bd40 0000005500000013 ffff888003573000 [ 53.465772] [ T2552] head: 0020000000000202 ffffea0004021701 000000040000004f 00000000ffffffff [ 53.465774] [ T2552] head: 0000000300000003 8000000300000002 0000000000000013 0000000000000004 [ 53.465775] [ T2552] page dumped because: VM_WARN_ON_FOLIO((_Generic((page + nr_pages - 1), const struct page *: (const struct folio *)_compound_head(page + nr_pages - 1), struct page *: (struct folio *)_compound_head(page + nr_pages - 1))) != folio) Original code works as expected everywhere, except on XEN PV, where pte_advance_pfn() can yield a pte_none() after balloon inflation due to MFNs invalidation. In XEN, pte_advance_pfn() ends up calling __pte()->xen_make_pte()->pte_pfn_to_mfn(), which returns pte_none() when mfn == INVALID_P2M_ENTRY. The pte_pfn_to_mfn() documents that nastiness: If there's no mfn for the pfn, then just create an empty non-present pte. Unfortunately this loses information about the original pfn, so pte_mfn_to_pfn is asymmetric. While such hacks should certainly be removed, we can do better in folio_pte_batch() and simply check ahead of time how many PTEs we can possibly batch in our folio. This way, we can not only fix the issue but cleanup the code: removing the pte_pfn() check inside the loop body and avoiding end_ptr comparison + arithmetic. Link: https://lkml.kernel.org/r/20250502215019.822-2-arkamar@atlas.cz Fixes: f8d9377 ("mm/memory: optimize fork() with PTE-mapped THP") Co-developed-by: David Hildenbrand <david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Petr Vaněk <arkamar@atlas.cz> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
1 parent fb881cd commit 7b08b74

File tree

1 file changed

+11
-16
lines changed

1 file changed

+11
-16
lines changed

mm/internal.h

Lines changed: 11 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -248,11 +248,9 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
248248
pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags,
249249
bool *any_writable, bool *any_young, bool *any_dirty)
250250
{
251-
unsigned long folio_end_pfn = folio_pfn(folio) + folio_nr_pages(folio);
252-
const pte_t *end_ptep = start_ptep + max_nr;
253251
pte_t expected_pte, *ptep;
254252
bool writable, young, dirty;
255-
int nr;
253+
int nr, cur_nr;
256254

257255
if (any_writable)
258256
*any_writable = false;
@@ -265,11 +263,15 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
265263
VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio);
266264
VM_WARN_ON_FOLIO(page_folio(pfn_to_page(pte_pfn(pte))) != folio, folio);
267265

266+
/* Limit max_nr to the actual remaining PFNs in the folio we could batch. */
267+
max_nr = min_t(unsigned long, max_nr,
268+
folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte));
269+
268270
nr = pte_batch_hint(start_ptep, pte);
269271
expected_pte = __pte_batch_clear_ignored(pte_advance_pfn(pte, nr), flags);
270272
ptep = start_ptep + nr;
271273

272-
while (ptep < end_ptep) {
274+
while (nr < max_nr) {
273275
pte = ptep_get(ptep);
274276
if (any_writable)
275277
writable = !!pte_write(pte);
@@ -282,27 +284,20 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
282284
if (!pte_same(pte, expected_pte))
283285
break;
284286

285-
/*
286-
* Stop immediately once we reached the end of the folio. In
287-
* corner cases the next PFN might fall into a different
288-
* folio.
289-
*/
290-
if (pte_pfn(pte) >= folio_end_pfn)
291-
break;
292-
293287
if (any_writable)
294288
*any_writable |= writable;
295289
if (any_young)
296290
*any_young |= young;
297291
if (any_dirty)
298292
*any_dirty |= dirty;
299293

300-
nr = pte_batch_hint(ptep, pte);
301-
expected_pte = pte_advance_pfn(expected_pte, nr);
302-
ptep += nr;
294+
cur_nr = pte_batch_hint(ptep, pte);
295+
expected_pte = pte_advance_pfn(expected_pte, cur_nr);
296+
ptep += cur_nr;
297+
nr += cur_nr;
303298
}
304299

305-
return min(ptep - start_ptep, max_nr);
300+
return min(nr, max_nr);
306301
}
307302

308303
/**

0 commit comments

Comments
 (0)