Skip to content

Commit a379972

Browse files
author
Paolo Abeni
committed
Merge branch 'net-page_pool-add-netlink-based-introspection'
Jakub Kicinski says: ==================== net: page_pool: add netlink-based introspection We recently started to deploy newer kernels / drivers at Meta, making significant use of page pools for the first time. We immediately run into page pool leaks both real and false positive warnings. As Eric pointed out/predicted there's no guarantee that applications will read / close their sockets so a page pool page may be stuck in a socket (but not leaked) forever. This happens a lot in our fleet. Most of these are obviously due to application bugs but we should not be printing kernel warnings due to minor application resource leaks. Conversely the page pool memory may get leaked at runtime, and we have no way to detect / track that, unless someone reconfigures the NIC and destroys the page pools which leaked the pages. The solution presented here is to expose the memory use of page pools via netlink. This allows for continuous monitoring of memory used by page pools, regardless if they were destroyed or not. Sample in patch 15 can print the memory use and recycling efficiency: $ ./page-pool eth0[2] page pools: 10 (zombies: 0) refs: 41984 bytes: 171966464 (refs: 0 bytes: 0) recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201) v4: - use dev_net(netdev)->loopback_dev - extend inflight doc v3: https://lore.kernel.org/all/20231122034420.1158898-1-kuba@kernel.org/ - ID is still here, can't decide if it matters - rename destroyed -> detach-time, good enough? - fix build for netsec v2: https://lore.kernel.org/r/20231121000048.789613-1-kuba@kernel.org - hopefully fix build with PAGE_POOL=n v1: https://lore.kernel.org/all/20231024160220.3973311-1-kuba@kernel.org/ - The main change compared to the RFC is that the API now exposes outstanding references and byte counts even for "live" page pools. The warning is no longer printed if page pool is accessible via netlink. RFC: https://lore.kernel.org/all/20230816234303.3786178-1-kuba@kernel.org/ ==================== Link: https://lore.kernel.org/r/20231126230740.2148636-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 parents a214724 + 637567e commit a379972

File tree

25 files changed

+1574
-33
lines changed

25 files changed

+1574
-33
lines changed

Documentation/netlink/specs/netdev.yaml

Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,112 @@ attribute-sets:
8686
See Documentation/networking/xdp-rx-metadata.rst for more details.
8787
type: u64
8888
enum: xdp-rx-metadata
89+
-
90+
name: page-pool
91+
attributes:
92+
-
93+
name: id
94+
doc: Unique ID of a Page Pool instance.
95+
type: uint
96+
checks:
97+
min: 1
98+
max: u32-max
99+
-
100+
name: ifindex
101+
doc: |
102+
ifindex of the netdev to which the pool belongs.
103+
May be reported as 0 if the page pool was allocated for a netdev
104+
which got destroyed already (page pools may outlast their netdevs
105+
because they wait for all memory to be returned).
106+
type: u32
107+
checks:
108+
min: 1
109+
max: s32-max
110+
-
111+
name: napi-id
112+
doc: Id of NAPI using this Page Pool instance.
113+
type: uint
114+
checks:
115+
min: 1
116+
max: u32-max
117+
-
118+
name: inflight
119+
type: uint
120+
doc: |
121+
Number of outstanding references to this page pool (allocated
122+
but yet to be freed pages). Allocated pages may be held in
123+
socket receive queues, driver receive ring, page pool recycling
124+
ring, the page pool cache, etc.
125+
-
126+
name: inflight-mem
127+
type: uint
128+
doc: |
129+
Amount of memory held by inflight pages.
130+
-
131+
name: detach-time
132+
type: uint
133+
doc: |
134+
Seconds in CLOCK_BOOTTIME of when Page Pool was detached by
135+
the driver. Once detached Page Pool can no longer be used to
136+
allocate memory.
137+
Page Pools wait for all the memory allocated from them to be freed
138+
before truly disappearing. "Detached" Page Pools cannot be
139+
"re-attached", they are just waiting to disappear.
140+
Attribute is absent if Page Pool has not been detached, and
141+
can still be used to allocate new memory.
142+
-
143+
name: page-pool-info
144+
subset-of: page-pool
145+
attributes:
146+
-
147+
name: id
148+
-
149+
name: ifindex
150+
-
151+
name: page-pool-stats
152+
doc: |
153+
Page pool statistics, see docs for struct page_pool_stats
154+
for information about individual statistics.
155+
attributes:
156+
-
157+
name: info
158+
doc: Page pool identifying information.
159+
type: nest
160+
nested-attributes: page-pool-info
161+
-
162+
name: alloc-fast
163+
type: uint
164+
value: 8 # reserve some attr ids in case we need more metadata later
165+
-
166+
name: alloc-slow
167+
type: uint
168+
-
169+
name: alloc-slow-high-order
170+
type: uint
171+
-
172+
name: alloc-empty
173+
type: uint
174+
-
175+
name: alloc-refill
176+
type: uint
177+
-
178+
name: alloc-waive
179+
type: uint
180+
-
181+
name: recycle-cached
182+
type: uint
183+
-
184+
name: recycle-cache-full
185+
type: uint
186+
-
187+
name: recycle-ring
188+
type: uint
189+
-
190+
name: recycle-ring-full
191+
type: uint
192+
-
193+
name: recycle-released-refcnt
194+
type: uint
89195

90196
operations:
91197
list:
@@ -120,8 +226,74 @@ operations:
120226
doc: Notification about device configuration being changed.
121227
notify: dev-get
122228
mcgrp: mgmt
229+
-
230+
name: page-pool-get
231+
doc: |
232+
Get / dump information about Page Pools.
233+
(Only Page Pools associated with a net_device can be listed.)
234+
attribute-set: page-pool
235+
do:
236+
request:
237+
attributes:
238+
- id
239+
reply: &pp-reply
240+
attributes:
241+
- id
242+
- ifindex
243+
- napi-id
244+
- inflight
245+
- inflight-mem
246+
- detach-time
247+
dump:
248+
reply: *pp-reply
249+
config-cond: page-pool
250+
-
251+
name: page-pool-add-ntf
252+
doc: Notification about page pool appearing.
253+
notify: page-pool-get
254+
mcgrp: page-pool
255+
config-cond: page-pool
256+
-
257+
name: page-pool-del-ntf
258+
doc: Notification about page pool disappearing.
259+
notify: page-pool-get
260+
mcgrp: page-pool
261+
config-cond: page-pool
262+
-
263+
name: page-pool-change-ntf
264+
doc: Notification about page pool configuration being changed.
265+
notify: page-pool-get
266+
mcgrp: page-pool
267+
config-cond: page-pool
268+
-
269+
name: page-pool-stats-get
270+
doc: Get page pool statistics.
271+
attribute-set: page-pool-stats
272+
do:
273+
request:
274+
attributes:
275+
- info
276+
reply: &pp-stats-reply
277+
attributes:
278+
- info
279+
- alloc-fast
280+
- alloc-slow
281+
- alloc-slow-high-order
282+
- alloc-empty
283+
- alloc-refill
284+
- alloc-waive
285+
- recycle-cached
286+
- recycle-cache-full
287+
- recycle-ring
288+
- recycle-ring-full
289+
- recycle-released-refcnt
290+
dump:
291+
reply: *pp-stats-reply
292+
config-cond: page-pool-stats
123293

124294
mcast-groups:
125295
list:
126296
-
127297
name: mgmt
298+
-
299+
name: page-pool

Documentation/networking/page_pool.rst

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,11 @@ Architecture overview
4141
| Fast cache | | ptr-ring cache |
4242
+-----------------+ +------------------+
4343
44+
Monitoring
45+
==========
46+
Information about page pools on the system can be accessed via the netdev
47+
genetlink family (see Documentation/netlink/specs/netdev.yaml).
48+
4449
API interface
4550
=============
4651
The number of pools created **must** match the number of hardware queues
@@ -107,8 +112,9 @@ page_pool_get_stats() and structures described below are available.
107112
It takes a pointer to a ``struct page_pool`` and a pointer to a struct
108113
page_pool_stats allocated by the caller.
109114

110-
The API will fill in the provided struct page_pool_stats with
111-
statistics about the page_pool.
115+
Older drivers expose page pool statistics via ethtool or debugfs.
116+
The same statistics are accessible via the netlink netdev family
117+
in a driver-independent fashion.
112118

113119
.. kernel-doc:: include/net/page_pool/types.h
114120
:identifiers: struct page_pool_recycle_stats

drivers/net/ethernet/broadcom/bnxt/bnxt.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3331,6 +3331,7 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
33313331
pp.pool_size += bp->rx_ring_size;
33323332
pp.nid = dev_to_node(&bp->pdev->dev);
33333333
pp.napi = &rxr->bnapi->napi;
3334+
pp.netdev = bp->dev;
33343335
pp.dev = &bp->pdev->dev;
33353336
pp.dma_dir = bp->rx_dir;
33363337
pp.max_len = PAGE_SIZE;

drivers/net/ethernet/mellanox/mlx5/core/en_main.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -902,6 +902,7 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params,
902902
pp_params.nid = node;
903903
pp_params.dev = rq->pdev;
904904
pp_params.napi = rq->cq.napi;
905+
pp_params.netdev = rq->netdev;
905906
pp_params.dma_dir = rq->buff.map_dir;
906907
pp_params.max_len = PAGE_SIZE;
907908

drivers/net/ethernet/microsoft/mana/mana_en.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2137,6 +2137,7 @@ static int mana_create_page_pool(struct mana_rxq *rxq, struct gdma_context *gc)
21372137
pprm.pool_size = RX_BUFFERS_PER_QUEUE;
21382138
pprm.nid = gc->numa_node;
21392139
pprm.napi = &rxq->rx_cq.napi;
2140+
pprm.netdev = rxq->ndev;
21402141

21412142
rxq->page_pool = page_pool_create(&pprm);
21422143

drivers/net/ethernet/socionext/netsec.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1302,6 +1302,8 @@ static int netsec_setup_rx_dring(struct netsec_priv *priv)
13021302
.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE,
13031303
.offset = NETSEC_RXBUF_HEADROOM,
13041304
.max_len = NETSEC_RX_BUF_SIZE,
1305+
.napi = &priv->napi,
1306+
.netdev = priv->ndev,
13051307
};
13061308
int i, err;
13071309

include/linux/list.h

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1119,6 +1119,26 @@ static inline void hlist_move_list(struct hlist_head *old,
11191119
old->first = NULL;
11201120
}
11211121

1122+
/**
1123+
* hlist_splice_init() - move all entries from one list to another
1124+
* @from: hlist_head from which entries will be moved
1125+
* @last: last entry on the @from list
1126+
* @to: hlist_head to which entries will be moved
1127+
*
1128+
* @to can be empty, @from must contain at least @last.
1129+
*/
1130+
static inline void hlist_splice_init(struct hlist_head *from,
1131+
struct hlist_node *last,
1132+
struct hlist_head *to)
1133+
{
1134+
if (to->first)
1135+
to->first->pprev = &last->next;
1136+
last->next = to->first;
1137+
to->first = from->first;
1138+
from->first->pprev = &to->first;
1139+
from->first = NULL;
1140+
}
1141+
11221142
#define hlist_entry(ptr, type, member) container_of(ptr,type,member)
11231143

11241144
#define hlist_for_each(pos, head) \

include/linux/netdevice.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2447,6 +2447,10 @@ struct net_device {
24472447
#if IS_ENABLED(CONFIG_DPLL)
24482448
struct dpll_pin *dpll_pin;
24492449
#endif
2450+
#if IS_ENABLED(CONFIG_PAGE_POOL)
2451+
/** @page_pools: page pools created for this netdevice */
2452+
struct hlist_head page_pools;
2453+
#endif
24502454
};
24512455
#define to_net_dev(d) container_of(d, struct net_device, dev)
24522456

include/linux/poison.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,8 @@
8383

8484
/********** net/core/skbuff.c **********/
8585
#define SKB_LIST_POISON_NEXT ((void *)(0x800 + POISON_POINTER_DELTA))
86+
/********** net/ **********/
87+
#define NET_PTR_POISON ((void *)(0x801 + POISON_POINTER_DELTA))
8688

8789
/********** kernel/bpf/ **********/
8890
#define BPF_PTR_POISON ((void *)(0xeB9FUL + POISON_POINTER_DELTA))

include/net/page_pool/helpers.h

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -55,16 +55,12 @@
5555
#include <net/page_pool/types.h>
5656

5757
#ifdef CONFIG_PAGE_POOL_STATS
58+
/* Deprecated driver-facing API, use netlink instead */
5859
int page_pool_ethtool_stats_get_count(void);
5960
u8 *page_pool_ethtool_stats_get_strings(u8 *data);
6061
u64 *page_pool_ethtool_stats_get(u64 *data, void *stats);
6162

62-
/*
63-
* Drivers that wish to harvest page pool stats and report them to users
64-
* (perhaps via ethtool, debugfs, or another mechanism) can allocate a
65-
* struct page_pool_stats call page_pool_get_stats to get stats for the specified pool.
66-
*/
67-
bool page_pool_get_stats(struct page_pool *pool,
63+
bool page_pool_get_stats(const struct page_pool *pool,
6864
struct page_pool_stats *stats);
6965
#else
7066
static inline int page_pool_ethtool_stats_get_count(void)

0 commit comments

Comments
 (0)