Skip to content

Commit 90abee6

Browse files
hnazakpm00
authored andcommitted
mm: page_alloc: speed up fallbacks in rmqueue_bulk()
The test robot identified c2f6ea3 ("mm: page_alloc: don't steal single pages from biggest buddy") as the root cause of a 56.4% regression in vm-scalability::lru-file-mmap-read. Carlos reports an earlier patch, c0cd6f5 ("mm: page_alloc: fix freelist movement during block conversion"), as the root cause for a regression in worst-case zone->lock+irqoff hold times. Both of these patches modify the page allocator's fallback path to be less greedy in an effort to stave off fragmentation. The flip side of this is that fallbacks are also less productive each time around, which means the fallback search can run much more frequently. Carlos' traces point to rmqueue_bulk() specifically, which tries to refill the percpu cache by allocating a large batch of pages in a loop. It highlights how once the native freelists are exhausted, the fallback code first scans orders top-down for whole blocks to claim, then falls back to a bottom-up search for the smallest buddy to steal. For the next batch page, it goes through the same thing again. This can be made more efficient. Since rmqueue_bulk() holds the zone->lock over the entire batch, the freelists are not subject to outside changes; when the search for a block to claim has already failed, there is no point in trying again for the next page. Modify __rmqueue() to remember the last successful fallback mode, and restart directly from there on the next rmqueue_bulk() iteration. Oliver confirms that this improves beyond the regression that the test robot reported against c2f6ea3: commit: f3b9217 ("tools/selftests: add guard region test for /proc/$pid/pagemap") c2f6ea3 ("mm: page_alloc: don't steal single pages from biggest buddy") acc4d5f ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") 2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()") <--- your patch f3b9217 c2f6ea3 acc4d5f 2c847f27c37da65a93d23c237c5 ---------------- --------------------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 25525364 ± 3% -56.4% 11135467 -57.8% 10779336 +31.6% 33581409 vm-scalability.throughput Carlos confirms that worst-case times are almost fully recovered compared to before the earlier culprit patch: 2dd482b (before freelist hygiene): 1ms c0cd6f5 (after freelist hygiene): 90ms next-20250319 (steal smallest buddy): 280ms this patch : 8ms [jackmanb@google.com: comment updates] Link: https://lkml.kernel.org/r/D92AC0P9594X.3BML64MUKTF8Z@google.com [hannes@cmpxchg.org: reset rmqueue_mode in rmqueue_buddy() error loop, per Yunsheng Lin] Link: https://lkml.kernel.org/r/20250409140023.GA2313@cmpxchg.org Link: https://lkml.kernel.org/r/20250407180154.63348-1-hannes@cmpxchg.org Fixes: c0cd6f5 ("mm: page_alloc: fix freelist movement during block conversion") Fixes: c2f6ea3 ("mm: page_alloc: don't steal single pages from biggest buddy") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Brendan Jackman <jackmanb@google.com> Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Carlos Song <carlos.song@nxp.com> Tested-by: Carlos Song <carlos.song@nxp.com> Tested-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com Reviewed-by: Brendan Jackman <jackmanb@google.com> Tested-by: Shivank Garg <shivankg@amd.com> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [6.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
1 parent 61c4e6c commit 90abee6

File tree

1 file changed

+79
-34
lines changed

1 file changed

+79
-34
lines changed

mm/page_alloc.c

Lines changed: 79 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -2183,23 +2183,15 @@ try_to_claim_block(struct zone *zone, struct page *page,
21832183
}
21842184

21852185
/*
2186-
* Try finding a free buddy page on the fallback list.
2187-
*
2188-
* This will attempt to claim a whole pageblock for the requested type
2189-
* to ensure grouping of such requests in the future.
2190-
*
2191-
* If a whole block cannot be claimed, steal an individual page, regressing to
2192-
* __rmqueue_smallest() logic to at least break up as little contiguity as
2193-
* possible.
2186+
* Try to allocate from some fallback migratetype by claiming the entire block,
2187+
* i.e. converting it to the allocation's start migratetype.
21942188
*
21952189
* The use of signed ints for order and current_order is a deliberate
21962190
* deviation from the rest of this file, to make the for loop
21972191
* condition simpler.
2198-
*
2199-
* Return the stolen page, or NULL if none can be found.
22002192
*/
22012193
static __always_inline struct page *
2202-
__rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
2194+
__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
22032195
unsigned int alloc_flags)
22042196
{
22052197
struct free_area *area;
@@ -2237,14 +2229,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
22372229
page = try_to_claim_block(zone, page, current_order, order,
22382230
start_migratetype, fallback_mt,
22392231
alloc_flags);
2240-
if (page)
2241-
goto got_one;
2232+
if (page) {
2233+
trace_mm_page_alloc_extfrag(page, order, current_order,
2234+
start_migratetype, fallback_mt);
2235+
return page;
2236+
}
22422237
}
22432238

2244-
if (alloc_flags & ALLOC_NOFRAGMENT)
2245-
return NULL;
2239+
return NULL;
2240+
}
2241+
2242+
/*
2243+
* Try to steal a single page from some fallback migratetype. Leave the rest of
2244+
* the block as its current migratetype, potentially causing fragmentation.
2245+
*/
2246+
static __always_inline struct page *
2247+
__rmqueue_steal(struct zone *zone, int order, int start_migratetype)
2248+
{
2249+
struct free_area *area;
2250+
int current_order;
2251+
struct page *page;
2252+
int fallback_mt;
2253+
bool claim_block;
22462254

2247-
/* No luck claiming pageblock. Find the smallest fallback page */
22482255
for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) {
22492256
area = &(zone->free_area[current_order]);
22502257
fallback_mt = find_suitable_fallback(area, current_order,
@@ -2254,25 +2261,28 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
22542261

22552262
page = get_page_from_free_area(area, fallback_mt);
22562263
page_del_and_expand(zone, page, order, current_order, fallback_mt);
2257-
goto got_one;
2264+
trace_mm_page_alloc_extfrag(page, order, current_order,
2265+
start_migratetype, fallback_mt);
2266+
return page;
22582267
}
22592268

22602269
return NULL;
2261-
2262-
got_one:
2263-
trace_mm_page_alloc_extfrag(page, order, current_order,
2264-
start_migratetype, fallback_mt);
2265-
2266-
return page;
22672270
}
22682271

2272+
enum rmqueue_mode {
2273+
RMQUEUE_NORMAL,
2274+
RMQUEUE_CMA,
2275+
RMQUEUE_CLAIM,
2276+
RMQUEUE_STEAL,
2277+
};
2278+
22692279
/*
22702280
* Do the hard work of removing an element from the buddy allocator.
22712281
* Call me with the zone->lock already held.
22722282
*/
22732283
static __always_inline struct page *
22742284
__rmqueue(struct zone *zone, unsigned int order, int migratetype,
2275-
unsigned int alloc_flags)
2285+
unsigned int alloc_flags, enum rmqueue_mode *mode)
22762286
{
22772287
struct page *page;
22782288

@@ -2291,16 +2301,48 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
22912301
}
22922302
}
22932303

2294-
page = __rmqueue_smallest(zone, order, migratetype);
2295-
if (unlikely(!page)) {
2296-
if (alloc_flags & ALLOC_CMA)
2304+
/*
2305+
* First try the freelists of the requested migratetype, then try
2306+
* fallbacks modes with increasing levels of fragmentation risk.
2307+
*
2308+
* The fallback logic is expensive and rmqueue_bulk() calls in
2309+
* a loop with the zone->lock held, meaning the freelists are
2310+
* not subject to any outside changes. Remember in *mode where
2311+
* we found pay dirt, to save us the search on the next call.
2312+
*/
2313+
switch (*mode) {
2314+
case RMQUEUE_NORMAL:
2315+
page = __rmqueue_smallest(zone, order, migratetype);
2316+
if (page)
2317+
return page;
2318+
fallthrough;
2319+
case RMQUEUE_CMA:
2320+
if (alloc_flags & ALLOC_CMA) {
22972321
page = __rmqueue_cma_fallback(zone, order);
2298-
2299-
if (!page)
2300-
page = __rmqueue_fallback(zone, order, migratetype,
2301-
alloc_flags);
2322+
if (page) {
2323+
*mode = RMQUEUE_CMA;
2324+
return page;
2325+
}
2326+
}
2327+
fallthrough;
2328+
case RMQUEUE_CLAIM:
2329+
page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
2330+
if (page) {
2331+
/* Replenished preferred freelist, back to normal mode. */
2332+
*mode = RMQUEUE_NORMAL;
2333+
return page;
2334+
}
2335+
fallthrough;
2336+
case RMQUEUE_STEAL:
2337+
if (!(alloc_flags & ALLOC_NOFRAGMENT)) {
2338+
page = __rmqueue_steal(zone, order, migratetype);
2339+
if (page) {
2340+
*mode = RMQUEUE_STEAL;
2341+
return page;
2342+
}
2343+
}
23022344
}
2303-
return page;
2345+
return NULL;
23042346
}
23052347

23062348
/*
@@ -2312,6 +2354,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
23122354
unsigned long count, struct list_head *list,
23132355
int migratetype, unsigned int alloc_flags)
23142356
{
2357+
enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
23152358
unsigned long flags;
23162359
int i;
23172360

@@ -2323,7 +2366,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
23232366
}
23242367
for (i = 0; i < count; ++i) {
23252368
struct page *page = __rmqueue(zone, order, migratetype,
2326-
alloc_flags);
2369+
alloc_flags, &rmqm);
23272370
if (unlikely(page == NULL))
23282371
break;
23292372

@@ -2948,7 +2991,9 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
29482991
if (alloc_flags & ALLOC_HIGHATOMIC)
29492992
page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
29502993
if (!page) {
2951-
page = __rmqueue(zone, order, migratetype, alloc_flags);
2994+
enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
2995+
2996+
page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm);
29522997

29532998
/*
29542999
* If the allocation fails, allow OOM handling and

0 commit comments

Comments
 (0)