Skip to content

Commit aabf58b

Browse files
tujinjiang11akpm00
authored andcommitted
mm/hugetlb: fix set_max_huge_pages() when there are surplus pages
In set_max_huge_pages(), min_count is computed taking into account surplus huge pages, which might lead in some cases to not be able to free huge pages and end up accounting them as surplus instead. One way to solve it is to subtract surplus_huge_pages directly, but we cannot do it blindly because there might be surplus pages that are also free pages, which might happen when we fail to restore the vmemmap for optimized hvo pages. So we could be subtracting the same page twice. In order to work this around, let us first compute the number of free persistent pages, and use that along with surplus pages to compute min_count. Steps to reproduce: 1) create 5 hugetlb folios in Node0 2) run a program to use all the hugetlb folios 3) echo 0 > nr_hugepages for Node0 to free the hugetlb folios. Thus the 5 hugetlb folios in Node0 are accounted as surplus. 4) create 5 hugetlb folios in Node1 5) echo 0 > nr_hugepages for Node1 to free the hugetlb folios The result: Node0 Node1 Total 5 5 Free 0 5 Surp 5 5 The result with this patch: Node0 Node1 Total 5 0 Free 0 0 Surp 5 0 Link: https://lkml.kernel.org/r/20250409055957.3774471-1-tujinjiang@huawei.com Link: https://lkml.kernel.org/r/20250407124706.2688092-1-tujinjiang@huawei.com Fixes: 9a30523 ("hugetlb: add per node hstate attributes") Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
1 parent 60580e0 commit aabf58b

File tree

1 file changed

+18
-1
lines changed

1 file changed

+18
-1
lines changed

mm/hugetlb.c

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3825,6 +3825,7 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed,
38253825
static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
38263826
nodemask_t *nodes_allowed)
38273827
{
3828+
unsigned long persistent_free_count;
38283829
unsigned long min_count;
38293830
unsigned long allocated;
38303831
struct folio *folio;
@@ -3959,8 +3960,24 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
39593960
* though, we'll note that we're not allowed to exceed surplus
39603961
* and won't grow the pool anywhere else. Not until one of the
39613962
* sysctls are changed, or the surplus pages go out of use.
3963+
*
3964+
* min_count is the expected number of persistent pages, we
3965+
* shouldn't calculate min_count by using
3966+
* resv_huge_pages + persistent_huge_pages() - free_huge_pages,
3967+
* because there may exist free surplus huge pages, and this will
3968+
* lead to subtracting twice. Free surplus huge pages come from HVO
3969+
* failing to restore vmemmap, see comments in the callers of
3970+
* hugetlb_vmemmap_restore_folio(). Thus, we should calculate
3971+
* persistent free count first.
39623972
*/
3963-
min_count = h->resv_huge_pages + h->nr_huge_pages - h->free_huge_pages;
3973+
persistent_free_count = h->free_huge_pages;
3974+
if (h->free_huge_pages > persistent_huge_pages(h)) {
3975+
if (h->free_huge_pages > h->surplus_huge_pages)
3976+
persistent_free_count -= h->surplus_huge_pages;
3977+
else
3978+
persistent_free_count = 0;
3979+
}
3980+
min_count = h->resv_huge_pages + persistent_huge_pages(h) - persistent_free_count;
39643981
min_count = max(count, min_count);
39653982
try_to_free_low(h, min_count, nodes_allowed);
39663983

0 commit comments

Comments
 (0)