Skip to content

Commit b1701d5

Browse files
committed
Merge tag 'mm-stable-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull remaining MM updates from Andrew Morton: "Three patch series - two that perform cleanups and one feature: - hugetlb_vmemmap cleanups from Muchun Song - hardware poisoning support for 1GB hugepages, from Naoya Horiguchi - highmem documentation fixups from Fabio De Francesco" * tag 'mm-stable-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits) Documentation/mm: add details about kmap_local_page() and preemption highmem: delete a sentence from kmap_local_page() kdocs Documentation/mm: rrefer kmap_local_page() and avoid kmap() Documentation/mm: avoid invalid use of addresses from kmap_local_page() Documentation/mm: don't kmap*() pages which can't come from HIGHMEM highmem: specify that kmap_local_page() is callable from interrupts highmem: remove unneeded spaces in kmap_local_page() kdocs mm, hwpoison: enable memory error handling on 1GB hugepage mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage mm, hwpoison: make __page_handle_poison returns int mm, hwpoison: set PG_hwpoison for busy hugetlb pages mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage mm, hwpoison, hugetlb: support saving mechanism of raw error pages mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages() mm: hugetlb_vmemmap: use PTRS_PER_PTE instead of PMD_SIZE / PAGE_SIZE mm: hugetlb_vmemmap: move code comments to vmemmap_dedup.rst mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability mm: hugetlb_vmemmap: replace early_param() with core_param() mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c ...
2 parents c235698 + a9e9c93 commit b1701d5

21 files changed

+823
-702
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1735,12 +1735,13 @@
17351735
hugetlb_free_vmemmap=
17361736
[KNL] Reguires CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
17371737
enabled.
1738+
Control if HugeTLB Vmemmap Optimization (HVO) is enabled.
17381739
Allows heavy hugetlb users to free up some more
17391740
memory (7 * PAGE_SIZE for each 2MB hugetlb page).
1740-
Format: { [oO][Nn]/Y/y/1 | [oO][Ff]/N/n/0 (default) }
1741+
Format: { on | off (default) }
17411742

1742-
[oO][Nn]/Y/y/1: enable the feature
1743-
[oO][Ff]/N/n/0: disable the feature
1743+
on: enable HVO
1744+
off: disable HVO
17441745

17451746
Built with CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=y,
17461747
the default is on.

Documentation/admin-guide/mm/hugetlbpage.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -164,8 +164,8 @@ default_hugepagesz
164164
will all result in 256 2M huge pages being allocated. Valid default
165165
huge page size is architecture dependent.
166166
hugetlb_free_vmemmap
167-
When CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP is set, this enables optimizing
168-
unused vmemmap pages associated with each HugeTLB page.
167+
When CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP is set, this enables HugeTLB
168+
Vmemmap Optimization (HVO).
169169

170170
When multiple huge page sizes are supported, ``/proc/sys/vm/nr_hugepages``
171171
indicates the current number of pre-allocated huge pages of the default size.

Documentation/admin-guide/mm/memory-hotplug.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -653,8 +653,8 @@ block might fail:
653653
- Concurrent activity that operates on the same physical memory area, such as
654654
allocating gigantic pages, can result in temporary offlining failures.
655655

656-
- Out of memory when dissolving huge pages, especially when freeing unused
657-
vmemmap pages associated with each hugetlb page is enabled.
656+
- Out of memory when dissolving huge pages, especially when HugeTLB Vmemmap
657+
Optimization (HVO) is enabled.
658658

659659
Offlining code may be able to migrate huge page contents, but may not be able
660660
to dissolve the source huge page because it fails allocating (unmovable) pages

Documentation/admin-guide/sysctl/vm.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -569,8 +569,7 @@ This knob is not available when the size of 'struct page' (a structure defined
569569
in include/linux/mm_types.h) is not power of two (an unusual system config could
570570
result in this).
571571

572-
Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap pages
573-
associated with each HugeTLB page.
572+
Enable (set to 1) or disable (set to 0) HugeTLB Vmemmap Optimization (HVO).
574573

575574
Once enabled, the vmemmap pages of subsequent allocation of HugeTLB pages from
576575
buddy allocator will be optimized (7 pages per 2MB HugeTLB page and 4095 pages

Documentation/mm/highmem.rst

Lines changed: 27 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -60,17 +60,40 @@ list shows them in order of preference of use.
6060
This function should be preferred, where feasible, over all the others.
6161

6262
These mappings are thread-local and CPU-local, meaning that the mapping
63-
can only be accessed from within this thread and the thread is bound the
64-
CPU while the mapping is active. Even if the thread is preempted (since
65-
preemption is never disabled by the function) the CPU can not be
66-
unplugged from the system via CPU-hotplug until the mapping is disposed.
63+
can only be accessed from within this thread and the thread is bound to the
64+
CPU while the mapping is active. Although preemption is never disabled by
65+
this function, the CPU can not be unplugged from the system via
66+
CPU-hotplug until the mapping is disposed.
6767

6868
It's valid to take pagefaults in a local kmap region, unless the context
6969
in which the local mapping is acquired does not allow it for other reasons.
7070

71+
As said, pagefaults and preemption are never disabled. There is no need to
72+
disable preemption because, when context switches to a different task, the
73+
maps of the outgoing task are saved and those of the incoming one are
74+
restored.
75+
7176
kmap_local_page() always returns a valid virtual address and it is assumed
7277
that kunmap_local() will never fail.
7378

79+
On CONFIG_HIGHMEM=n kernels and for low memory pages this returns the
80+
virtual address of the direct mapping. Only real highmem pages are
81+
temporarily mapped. Therefore, users may call a plain page_address()
82+
for pages which are known to not come from ZONE_HIGHMEM. However, it is
83+
always safe to use kmap_local_page() / kunmap_local().
84+
85+
While it is significantly faster than kmap(), for the higmem case it
86+
comes with restrictions about the pointers validity. Contrary to kmap()
87+
mappings, the local mappings are only valid in the context of the caller
88+
and cannot be handed to other contexts. This implies that users must
89+
be absolutely sure to keep the use of the return address local to the
90+
thread which mapped it.
91+
92+
Most code can be designed to use thread local mappings. User should
93+
therefore try to design their code to avoid the use of kmap() by mapping
94+
pages in the same thread the address will be used and prefer
95+
kmap_local_page().
96+
7497
Nesting kmap_local_page() and kmap_atomic() mappings is allowed to a certain
7598
extent (up to KMAP_TYPE_NR) but their invocations have to be strictly ordered
7699
because the map implementation is stack based. See kmap_local_page() kdocs

Documentation/mm/vmemmap_dedup.rst

Lines changed: 49 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -7,23 +7,25 @@ A vmemmap diet for HugeTLB and Device DAX
77
HugeTLB
88
=======
99

10-
The struct page structures (page structs) are used to describe a physical
11-
page frame. By default, there is a one-to-one mapping from a page frame to
12-
it's corresponding page struct.
10+
This section is to explain how HugeTLB Vmemmap Optimization (HVO) works.
11+
12+
The ``struct page`` structures are used to describe a physical page frame. By
13+
default, there is a one-to-one mapping from a page frame to it's corresponding
14+
``struct page``.
1315

1416
HugeTLB pages consist of multiple base page size pages and is supported by many
1517
architectures. See Documentation/admin-guide/mm/hugetlbpage.rst for more
1618
details. On the x86-64 architecture, HugeTLB pages of size 2MB and 1GB are
1719
currently supported. Since the base page size on x86 is 4KB, a 2MB HugeTLB page
1820
consists of 512 base pages and a 1GB HugeTLB page consists of 4096 base pages.
19-
For each base page, there is a corresponding page struct.
21+
For each base page, there is a corresponding ``struct page``.
2022

21-
Within the HugeTLB subsystem, only the first 4 page structs are used to
22-
contain unique information about a HugeTLB page. __NR_USED_SUBPAGE provides
23-
this upper limit. The only 'useful' information in the remaining page structs
23+
Within the HugeTLB subsystem, only the first 4 ``struct page`` are used to
24+
contain unique information about a HugeTLB page. ``__NR_USED_SUBPAGE`` provides
25+
this upper limit. The only 'useful' information in the remaining ``struct page``
2426
is the compound_head field, and this field is the same for all tail pages.
2527

26-
By removing redundant page structs for HugeTLB pages, memory can be returned
28+
By removing redundant ``struct page`` for HugeTLB pages, memory can be returned
2729
to the buddy allocator for other uses.
2830

2931
Different architectures support different HugeTLB pages. For example, the
@@ -44,7 +46,7 @@ page.
4446
| | 64KB | 2MB | 512MB | 16GB | |
4547
+--------------+-----------+-----------+-----------+-----------+-----------+
4648

47-
When the system boot up, every HugeTLB page has more than one struct page
49+
When the system boot up, every HugeTLB page has more than one ``struct page``
4850
structs which size is (unit: pages)::
4951

5052
struct_size = HugeTLB_Size / PAGE_SIZE * sizeof(struct page) / PAGE_SIZE
@@ -74,10 +76,10 @@ Where n is how many pte entries which one page can contains. So the value of
7476
n is (PAGE_SIZE / sizeof(pte_t)).
7577

7678
This optimization only supports 64-bit system, so the value of sizeof(pte_t)
77-
is 8. And this optimization also applicable only when the size of struct page
78-
is a power of two. In most cases, the size of struct page is 64 bytes (e.g.
79+
is 8. And this optimization also applicable only when the size of ``struct page``
80+
is a power of two. In most cases, the size of ``struct page`` is 64 bytes (e.g.
7981
x86-64 and arm64). So if we use pmd level mapping for a HugeTLB page, the
80-
size of struct page structs of it is 8 page frames which size depends on the
82+
size of ``struct page`` structs of it is 8 page frames which size depends on the
8183
size of the base page.
8284

8385
For the HugeTLB page of the pud level mapping, then::
@@ -86,15 +88,15 @@ For the HugeTLB page of the pud level mapping, then::
8688
= PAGE_SIZE / 8 * 8 (pages)
8789
= PAGE_SIZE (pages)
8890

89-
Where the struct_size(pmd) is the size of the struct page structs of a
91+
Where the struct_size(pmd) is the size of the ``struct page`` structs of a
9092
HugeTLB page of the pmd level mapping.
9193

9294
E.g.: A 2MB HugeTLB page on x86_64 consists in 8 page frames while 1GB
9395
HugeTLB page consists in 4096.
9496

9597
Next, we take the pmd level mapping of the HugeTLB page as an example to
9698
show the internal implementation of this optimization. There are 8 pages
97-
struct page structs associated with a HugeTLB page which is pmd mapped.
99+
``struct page`` structs associated with a HugeTLB page which is pmd mapped.
98100

99101
Here is how things look before optimization::
100102

@@ -122,10 +124,10 @@ Here is how things look before optimization::
122124
+-----------+
123125

124126
The value of page->compound_head is the same for all tail pages. The first
125-
page of page structs (page 0) associated with the HugeTLB page contains the 4
126-
page structs necessary to describe the HugeTLB. The only use of the remaining
127-
pages of page structs (page 1 to page 7) is to point to page->compound_head.
128-
Therefore, we can remap pages 1 to 7 to page 0. Only 1 page of page structs
127+
page of ``struct page`` (page 0) associated with the HugeTLB page contains the 4
128+
``struct page`` necessary to describe the HugeTLB. The only use of the remaining
129+
pages of ``struct page`` (page 1 to page 7) is to point to page->compound_head.
130+
Therefore, we can remap pages 1 to 7 to page 0. Only 1 page of ``struct page``
129131
will be used for each HugeTLB page. This will allow us to free the remaining
130132
7 pages to the buddy allocator.
131133

@@ -167,13 +169,37 @@ entries that can be cached in a single TLB entry.
167169

168170
The contiguous bit is used to increase the mapping size at the pmd and pte
169171
(last) level. So this type of HugeTLB page can be optimized only when its
170-
size of the struct page structs is greater than 1 page.
172+
size of the ``struct page`` structs is greater than **1** page.
171173

172174
Notice: The head vmemmap page is not freed to the buddy allocator and all
173175
tail vmemmap pages are mapped to the head vmemmap page frame. So we can see
174-
more than one struct page struct with PG_head (e.g. 8 per 2 MB HugeTLB page)
175-
associated with each HugeTLB page. The compound_head() can handle this
176-
correctly (more details refer to the comment above compound_head()).
176+
more than one ``struct page`` struct with ``PG_head`` (e.g. 8 per 2 MB HugeTLB
177+
page) associated with each HugeTLB page. The ``compound_head()`` can handle
178+
this correctly. There is only **one** head ``struct page``, the tail
179+
``struct page`` with ``PG_head`` are fake head ``struct page``. We need an
180+
approach to distinguish between those two different types of ``struct page`` so
181+
that ``compound_head()`` can return the real head ``struct page`` when the
182+
parameter is the tail ``struct page`` but with ``PG_head``. The following code
183+
snippet describes how to distinguish between real and fake head ``struct page``.
184+
185+
.. code-block:: c
186+
187+
if (test_bit(PG_head, &page->flags)) {
188+
unsigned long head = READ_ONCE(page[1].compound_head);
189+
190+
if (head & 1) {
191+
if (head == (unsigned long)page + 1)
192+
/* head struct page */
193+
else
194+
/* tail struct page */
195+
} else {
196+
/* head struct page */
197+
}
198+
}
199+
200+
We can safely access the field of the **page[1]** with ``PG_head`` because the
201+
page is a compound page composed with at least two contiguous pages.
202+
The implementation refers to ``page_fixed_fake_head()``.
177203

178204
Device DAX
179205
==========
@@ -187,7 +213,7 @@ PMD_SIZE (2M on x86_64) and PUD_SIZE (1G on x86_64).
187213

188214
The differences with HugeTLB are relatively minor.
189215

190-
It only use 3 page structs for storing all information as opposed
216+
It only use 3 ``struct page`` for storing all information as opposed
191217
to 4 on HugeTLB pages.
192218

193219
There's no remapping of vmemmap given that device-dax memory is not part of

arch/arm64/mm/flush.c

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -76,17 +76,10 @@ EXPORT_SYMBOL_GPL(__sync_icache_dcache);
7676
void flush_dcache_page(struct page *page)
7777
{
7878
/*
79-
* Only the head page's flags of HugeTLB can be cleared since the tail
80-
* vmemmap pages associated with each HugeTLB page are mapped with
81-
* read-only when CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP is enabled (more
82-
* details can refer to vmemmap_remap_pte()). Although
83-
* __sync_icache_dcache() only set PG_dcache_clean flag on the head
84-
* page struct, there is more than one page struct with PG_dcache_clean
85-
* associated with the HugeTLB page since the head vmemmap page frame
86-
* is reused (more details can refer to the comments above
87-
* page_fixed_fake_head()).
79+
* HugeTLB pages are always fully mapped and only head page will be
80+
* set PG_dcache_clean (see comments in __sync_icache_dcache()).
8881
*/
89-
if (hugetlb_optimize_vmemmap_enabled() && PageHuge(page))
82+
if (PageHuge(page))
9083
page = compound_head(page);
9184

9285
if (test_bit(PG_dcache_clean, &page->flags))

arch/x86/mm/hugetlbpage.c

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,15 @@ int pmd_huge(pmd_t pmd)
3030
(pmd_val(pmd) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
3131
}
3232

33+
/*
34+
* pud_huge() returns 1 if @pud is hugetlb related entry, that is normal
35+
* hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
36+
* Otherwise, returns 0.
37+
*/
3338
int pud_huge(pud_t pud)
3439
{
35-
return !!(pud_val(pud) & _PAGE_PSE);
40+
return !pud_none(pud) &&
41+
(pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
3642
}
3743

3844
#ifdef CONFIG_HUGETLB_PAGE

fs/Kconfig

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -247,8 +247,7 @@ config HUGETLB_PAGE
247247

248248
#
249249
# Select this config option from the architecture Kconfig, if it is preferred
250-
# to enable the feature of minimizing overhead of struct page associated with
251-
# each HugeTLB page.
250+
# to enable the feature of HugeTLB Vmemmap Optimization (HVO).
252251
#
253252
config ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
254253
bool
@@ -259,14 +258,13 @@ config HUGETLB_PAGE_OPTIMIZE_VMEMMAP
259258
depends on SPARSEMEM_VMEMMAP
260259

261260
config HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON
262-
bool "Default optimizing vmemmap pages of HugeTLB to on"
261+
bool "HugeTLB Vmemmap Optimization (HVO) defaults to on"
263262
default n
264263
depends on HUGETLB_PAGE_OPTIMIZE_VMEMMAP
265264
help
266-
When using HUGETLB_PAGE_OPTIMIZE_VMEMMAP, the optimizing unused vmemmap
267-
pages associated with each HugeTLB page is default off. Say Y here
268-
to enable optimizing vmemmap pages of HugeTLB by default. It can then
269-
be disabled on the command line via hugetlb_free_vmemmap=off.
265+
The HugeTLB VmemmapvOptimization (HVO) defaults to off. Say Y here to
266+
enable HVO by default. It can be disabled via hugetlb_free_vmemmap=off
267+
(boot command line) or hugetlb_optimize_vmemmap (sysctl).
270268

271269
config MEMFD_CREATE
272270
def_bool TMPFS || HUGETLBFS

include/linux/highmem.h

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -60,11 +60,11 @@ static inline void kmap_flush_unused(void);
6060

6161
/**
6262
* kmap_local_page - Map a page for temporary usage
63-
* @page: Pointer to the page to be mapped
63+
* @page: Pointer to the page to be mapped
6464
*
6565
* Returns: The virtual address of the mapping
6666
*
67-
* Can be invoked from any context.
67+
* Can be invoked from any context, including interrupts.
6868
*
6969
* Requires careful handling when nesting multiple mappings because the map
7070
* management is stack based. The unmap has to be in the reverse order of
@@ -86,8 +86,7 @@ static inline void kmap_flush_unused(void);
8686
* temporarily mapped.
8787
*
8888
* While it is significantly faster than kmap() for the higmem case it
89-
* comes with restrictions about the pointer validity. Only use when really
90-
* necessary.
89+
* comes with restrictions about the pointer validity.
9190
*
9291
* On HIGHMEM enabled systems mapping a highmem page has the side effect of
9392
* disabling migration in order to keep the virtual address stable across

0 commit comments

Comments
 (0)