-
Notifications
You must be signed in to change notification settings - Fork 56
Description
Hi @frank-w ,
Thanks for all your work on/for the mediatek devices.
I am using a MT621DAT (so, MT7621AT + 128MB embedded RAM) and sporadicly, but reproducibly if wanted, I get some page allocation errors:
(Custom buildroot build with mainline 6.12 kernel + openwrt out-of-tree patches)
# [ 3011.709680] warn_alloc: 1 callbacks suppressed
[ 3011.709726] systemd-network: page allocation failure: order:10, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)
[ 3011.725868] CPU: 3 UID: 101 PID: 4191 Comm: systemd-network Not tainted 6.12.39 #8
[ 3011.725931] Hardware name: MT7621DAT
[ 3011.725948] Stack : 00000001 8009888c 00000001 00000004 00000001 83c41750 83c417c4 00000000
[ 3011.726039] 01000000 80097b10 00000000 00000000 00000000 00000001 83c41770 814c8000
[ 3011.726099] 00000000 00000000 80a493d8 83c41620 ffffefff 00000000 80b0c00c 000001ef
[ 3011.726155] 00000000 000001f1 80b0c038 fffffff9 00000001 00000000 80a493d8 00000001
[ 3011.726211] 00000001 80b04574 00000000 00000400 00000003 fffc7fb3 0000000c 80cd000c
[ 3011.726269] ...
[ 3011.726284] Call Trace:
[ 3011.726291] [<80008430>] show_stack+0x28/0xf0
[ 3011.726352] [<8091ce10>] dump_stack_lvl+0x70/0xb0
[ 3011.726387] [<801e3e6c>] warn_alloc+0xb8/0x148
[ 3011.726439] [<801e406c>] __alloc_pages_noprof+0x170/0xd04
[ 3011.726466] [<801e9ba4>] ___kmalloc_large_node+0x64/0xf8
[ 3011.726496] [<801ee0a0>] __kmalloc_noprof+0x22c/0x3c0
[ 3011.726520] [<805d93d4>] mtk_open+0xb20/0xcb8
[ 3011.726542] [<806dfe48>] __dev_open+0xd8/0x198
[ 3011.726569] [<806e0338>] __dev_change_flags+0x1c0/0x208
[ 3011.726591] [<806e03a4>] dev_change_flags+0x24/0x70
[ 3011.726610] [<806f4aa4>] do_setlink+0x2d4/0x102c
[ 3011.726638] [<806f58d4>] rtnl_setlink+0xd8/0x154
[ 3011.726658] [<806f2890>] rtnetlink_rcv_msg+0x350/0x47c
[ 3011.726679] [<80746eb0>] netlink_rcv_skb+0x94/0x130
[ 3011.726711] [<80746578>] netlink_unicast+0x284/0x448
[ 3011.726733] [<807469d0>] netlink_sendmsg+0x294/0x460
[ 3011.726755] [<806a76c4>] __sys_sendto+0xbc/0x120
[ 3011.726792] [<800138cc>] syscall_common+0x34/0x58
[ 3011.726828]
[ 3011.726842] Mem-Info:
[ 3011.878151] active_anon:51 inactive_anon:2644 isolated_anon:0
[ 3011.878151] active_file:2379 inactive_file:4276 isolated_file:32
[ 3011.878151] unevictable:0 dirty:135 writeback:0
[ 3011.878151] slab_reclaimable:680 slab_unreclaimable:4961
[ 3011.878151] mapped:3287 shmem:553 pagetables:142
[ 3011.878151] sec_pagetables:0 bounce:0
[ 3011.878151] kernel_misc_reclaimable:0
[ 3011.878151] free:11393 free_pcp:103 free_cma:0
[ 3011.916762] Node 0 active_anon:204kB inactive_anon:10828kB active_file:9516kB inactive_file:17132kB unevictable:0kB isolated(anon):0kB isolated(file):128kB mapped:13344kB dirty:540kB writeback:0kB shmem:2212kB writeback_tmp:0kB kernel_stack:984kB pagetables:568kB sec_pagetables:0kB all_unreclaimable? no
[ 3011.916846] Normal free:45132kB boost:0kB min:1360kB low:1700kB high:2040kB reserved_highatomic:0KB active_anon:204kB inactive_anon:10740kB active_file:9584kB inactive_file:17168kB unevictable:0kB writepending:524kB present:131072kB managed:117500kB mlocked:0kB bounce:0kB free_pcp:520kB local_pcp:0kB free_cma:0kB
[ 3011.916904] lowmem_reserve[]: 0 0 0
[ 3011.916957] Normal: 145*4kB (UE) 155*8kB (UME) 238*16kB (UME) 231*32kB (UME) 106*64kB (UME) 51*128kB (UME) 27*256kB (M) 15*512kB (M) 2*1024kB (M) 1*2048kB (M) 0*4096kB = 45020kB
[ 3011.917229] 7274 total pagecache pages
[ 3011.917247] 0 pages in swap cache
[ 3011.917260] Free swap = 0kB
[ 3011.917272] Total swap = 0kB
[ 3011.917284] 32768 pages RAM
[ 3011.917296] 0 pages HighMem/MovableOnly
[ 3011.917309] 3393 pages reserved
The error happens in mtk_open and more specifically in the mtk_init_fq_dma call.
There this kcalloc memory allocation happens: https://github.com/torvalds/linux/blob/v6.12/drivers/net/ethernet/mediatek/mtk_eth_soc.c#L1162
This calloc gives the error above, with a fairly high amount of memory requested (4MB , order:10).
This is because:
int cnt = soc->tx.fq_dma_size; = MTK_DMA_SIZE(2K) = 2048 (for MT7621)
DIV_ROUND_UP(soc->tx.fq_dma_size, MTK_FQ_DMA_LENGTH) = 2048/2048 = 1
for (j = 0; j < DIV_ROUND_UP(soc->tx.fq_dma_size, MTK_FQ_DMA_LENGTH); j++) {
len = min_t(int, cnt - j * MTK_FQ_DMA_LENGTH, MTK_FQ_DMA_LENGTH);
eth->scratch_head[j] = kcalloc(len, MTK_QDMA_PAGE_SIZE, GFP_KERNEL);
Which gives:
j = 0 ; j < 1; j++
len = min_t(2048 - 0 * 2048, 2048) = 2048
eth->scratch_head[0] = kcalloc(2048, 2048, GFP_KERNEL);
I believe this code was intended to split up the fq_dma_size into chunks of MTK_FQ_DMA_LENGTH.
In that case, only 1 element of size 2048 should have been allocated. Am I correct in this assumption?
In kernel 5.10 this used to be: https://github.com/torvalds/linux/blob/v5.10/drivers/net/ethernet/mediatek/mtk_eth_soc.c#L803
#define MTK_DMA_SIZE 256 x #define MTK_QDMA_PAGE_SIZE 2048
So, 256 * 2048 bytes
In my case there is at least 4MB available, but not coherently, as a non-fragmented memory block.
So this memory allocation fails.
Related issues I found:
openwrt/mt76#592
#62
openwrt/openwrt#12143 (comment)
To reproduce the issue:
Run:
ip link set down eth0
ip link del dev br0
nice -n -10 stress --vm 1 --vm-bytes 10000000 &
nice -n -10 stress --vm 1 --vm-bytes 50000000 &
(I think just only the last line also should suffice, might increase the vm-bytes depending on the available memory).
Wait until stress is running (few seconds).
Then run:
killall stress
systemctl restart systemd-networkd
(Or /etc/init.d/network restart if you are running init.d , I am using systemd).
You get the error and eth0 cannot get initialised:
[ 51.912288] mt7530-mdio mdio-bus:1f lan3: failed to open conduit eth0
[ 51.920335] mt7530-mdio mdio-bus:1f lan5: failed to open conduit eth0
[ 51.928359] mt7530-mdio mdio-bus:1f lan2: failed to open conduit eth0
[ 51.936288] mt7530-mdio mdio-bus:1f lan1: failed to open conduit eth0
[ 51.943893] mt7530-mdio mdio-bus:1f lan4: failed to open conduit eth0