Skip to content

Commit 1c00f93

Browse files
Dave Chinnerakpm00
authored andcommitted
mm: lift gfp_kmemleak_mask() to gfp.h
Patch series "mm: fix nested allocation context filtering". This patchset is the followup to the comment I made earlier today: https://lore.kernel.org/linux-xfs/ZjAyIWUzDipofHFJ@dread.disaster.area/ Tl;dr: Memory allocations that are done inside the public memory allocation API need to obey the reclaim recursion constraints placed on the allocation by the original caller, including the "don't track recursion for this allocation" case defined by __GFP_NOLOCKDEP. These nested allocations are generally in debug code that is tracking something about the allocation (kmemleak, KASAN, etc) and so are allocating private kernel objects that only that debug system will use. Neither the page-owner code nor the stack depot code get this right. They also also clear GFP_ZONEMASK as a separate operation, which is completely redundant because the constraint filter applied immediately after guarantees that GFP_ZONEMASK bits are cleared. kmemleak gets this filtering right. It preserves the allocation constraints for deadlock prevention and clears all other context flags whilst also ensuring that the nested allocation will fail quickly, silently and without depleting emergency kernel reserves if there is no memory available. This can be made much more robust, immune to whack-a-mole games and the code greatly simplified by lifting gfp_kmemleak_mask() to include/linux/gfp.h and using that everywhere. Also document it so that there is no excuse for not knowing about it when writing new debug code that nests allocations. Tested with lockdep, KASAN + page_owner=on and kmemleak=on over multiple fstests runs with XFS. This patch (of 3): Any "internal" nested allocation done from within an allocation context needs to obey the high level allocation gfp_mask constraints. This is necessary for debug code like KASAN, kmemleak, lockdep, etc that allocate memory for saving stack traces and other information during memory allocation. If they don't obey things like __GFP_NOLOCKDEP or __GFP_NOWARN, they produce false positive failure detections. kmemleak gets this right by using gfp_kmemleak_mask() to pass through the relevant context flags to the nested allocation to ensure that the allocation follows the constraints of the caller context. KASAN recently was foudn to be missing __GFP_NOLOCKDEP due to stack depot allocations, and even more recently the page owner tracking code was also found to be missing __GFP_NOLOCKDEP support. We also don't wan't want KASAN or lockdep to drive the system into OOM kill territory by exhausting emergency reserves. This is something that kmemleak also gets right by adding (__GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN) to the allocation mask. Hence it is clear that we need to define a common nested allocation filter mask for these sorts of third party nested allocations used in debug code. So to start this process, lift gfp_kmemleak_mask() to gfp.h and rename it to gfp_nested_mask(), and convert the kmemleak callers to use it. Link: https://lkml.kernel.org/r/20240430054604.4169568-1-david@fromorbit.com Link: https://lkml.kernel.org/r/20240430054604.4169568-2-david@fromorbit.com Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
1 parent eb6a933 commit 1c00f93

File tree

2 files changed

+29
-8
lines changed

2 files changed

+29
-8
lines changed

include/linux/gfp.h

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,31 @@ static inline int gfp_zonelist(gfp_t flags)
156156
return ZONELIST_FALLBACK;
157157
}
158158

159+
/*
160+
* gfp flag masking for nested internal allocations.
161+
*
162+
* For code that needs to do allocations inside the public allocation API (e.g.
163+
* memory allocation tracking code) the allocations need to obey the caller
164+
* allocation context constrains to prevent allocation context mismatches (e.g.
165+
* GFP_KERNEL allocations in GFP_NOFS contexts) from potential deadlock
166+
* situations.
167+
*
168+
* It is also assumed that these nested allocations are for internal kernel
169+
* object storage purposes only and are not going to be used for DMA, etc. Hence
170+
* we strip out all the zone information and leave just the context information
171+
* intact.
172+
*
173+
* Further, internal allocations must fail before the higher level allocation
174+
* can fail, so we must make them fail faster and fail silently. We also don't
175+
* want them to deplete emergency reserves. Hence nested allocations must be
176+
* prepared for these allocations to fail.
177+
*/
178+
static inline gfp_t gfp_nested_mask(gfp_t flags)
179+
{
180+
return ((flags & (GFP_KERNEL | GFP_ATOMIC | __GFP_NOLOCKDEP)) |
181+
(__GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN));
182+
}
183+
159184
/*
160185
* We get the zone list from the current node and the gfp_mask.
161186
* This zone list contains a maximum of MAX_NUMNODES*MAX_NR_ZONES zones.

mm/kmemleak.c

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -114,12 +114,6 @@
114114

115115
#define BYTES_PER_POINTER sizeof(void *)
116116

117-
/* GFP bitmask for kmemleak internal allocations */
118-
#define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC | \
119-
__GFP_NOLOCKDEP)) | \
120-
__GFP_NORETRY | __GFP_NOMEMALLOC | \
121-
__GFP_NOWARN)
122-
123117
/* scanning area inside a memory block */
124118
struct kmemleak_scan_area {
125119
struct hlist_node node;
@@ -463,7 +457,8 @@ static struct kmemleak_object *mem_pool_alloc(gfp_t gfp)
463457

464458
/* try the slab allocator first */
465459
if (object_cache) {
466-
object = kmem_cache_alloc_noprof(object_cache, gfp_kmemleak_mask(gfp));
460+
object = kmem_cache_alloc_noprof(object_cache,
461+
gfp_nested_mask(gfp));
467462
if (object)
468463
return object;
469464
}
@@ -947,7 +942,8 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp)
947942
untagged_objp = (unsigned long)kasan_reset_tag((void *)object->pointer);
948943

949944
if (scan_area_cache)
950-
area = kmem_cache_alloc_noprof(scan_area_cache, gfp_kmemleak_mask(gfp));
945+
area = kmem_cache_alloc_noprof(scan_area_cache,
946+
gfp_nested_mask(gfp));
951947

952948
raw_spin_lock_irqsave(&object->lock, flags);
953949
if (!area) {

0 commit comments

Comments
 (0)