Skip to content

Commit b96a3e9

Browse files
committed
Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton: - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which reduces the special-case code for handling hugetlb pages in GUP. It also speeds up GUP handling of transparent hugepages. - Peng Zhang provides some maple tree speedups ("Optimize the fast path of mas_store()"). - Sergey Senozhatsky has improved te performance of zsmalloc during compaction (zsmalloc: small compaction improvements"). - Domenico Cerasuolo has developed additional selftest code for zswap ("selftests: cgroup: add zswap test program"). - xu xin has doe some work on KSM's handling of zero pages. These changes are mainly to enable the user to better understand the effectiveness of KSM's treatment of zero pages ("ksm: support tracking KSM-placed zero-pages"). - Jeff Xu has fixes the behaviour of memfd's MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED"). - David Howells has fixed an fscache optimization ("mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"). - Axel Rasmussen has given userfaultfd the ability to simulate memory poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD"). - Miaohe Lin has contributed some routine maintenance work on the memory-failure code ("mm: memory-failure: remove unneeded PageHuge() check"). - Peng Zhang has contributed some maintenance work on the maple tree code ("Improve the validation for maple tree and some cleanup"). - Hugh Dickins has optimized the collapsing of shmem or file pages into THPs ("mm: free retracted page table by RCU"). - Jiaqi Yan has a patch series which permits us to use the healthy subpages within a hardware poisoned huge page for general purposes ("Improve hugetlbfs read on HWPOISON hugepages"). - Kemeng Shi has done some maintenance work on the pagetable-check code ("Remove unused parameters in page_table_check"). - More folioification work from Matthew Wilcox ("More filesystem folio conversions for 6.6"), ("Followup folio conversions for zswap"). And from ZhangPeng ("Convert several functions in page_io.c to use a folio"). - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext"). - Baoquan He has converted some architectures to use the GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take GENERIC_IOREMAP way"). - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support batched/deferred tlb shootdown during page reclamation/migration"). - Better maple tree lockdep checking from Liam Howlett ("More strict maple tree lockdep"). Liam also developed some efficiency improvements ("Reduce preallocations for maple tree"). - Cleanup and optimization to the secondary IOMMU TLB invalidation, from Alistair Popple ("Invalidate secondary IOMMU TLB on permission upgrade"). - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes for arm64"). - Kemeng Shi provides some maintenance work on the compaction code ("Two minor cleanups for compaction"). - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most file-backed faults under the VMA lock"). - Aneesh Kumar contributes code to use the vmemmap optimization for DAX on ppc64, under some circumstances ("Add support for DAX vmemmap optimization for ppc64"). - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client data in page_ext"), ("minor cleanups to page_ext header"). - Some zswap cleanups from Johannes Weiner ("mm: zswap: three cleanups"). - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan"). - VMA handling cleanups from Kefeng Wang ("mm: convert to vma_is_initial_heap/stack()"). - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes: implement DAMOS tried total bytes file"), ("Extend DAMOS filters for address ranges and DAMON monitoring targets"). - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction"). - Liam Howlett has improved the maple tree node replacement code ("maple_tree: Change replacement strategy"). - ZhangPeng has a general code cleanup - use the K() macro more widely ("cleanup with helper macro K()"). - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap on memory feature on ppc64"). - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list in page_alloc"), ("Two minor cleanups for get pageblock migratetype"). - Vishal Moola introduces a memory descriptor for page table tracking, "struct ptdesc" ("Split ptdesc from struct page"). - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups for vm.memfd_noexec"). - MM include file rationalization from Hugh Dickins ("arch: include asm/cacheflush.h in asm/hugetlb.h"). - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text output"). - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use object_cache instead of kmemleak_initialized"). - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor and _folio_order"). - A VMA locking scalability improvement from Suren Baghdasaryan ("Per-VMA lock support for swap and userfaults"). - pagetable handling cleanups from Matthew Wilcox ("New page table range API"). - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop using page->private on tail pages for THP_SWAP + cleanups"). - Cleanups and speedups to the hugetlb fault handling from Matthew Wilcox ("Change calling convention for ->huge_fault"). - Matthew Wilcox has also done some maintenance work on the MM subsystem documentation ("Improve mm documentation"). * tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits) maple_tree: shrink struct maple_tree maple_tree: clean up mas_wr_append() secretmem: convert page_is_secretmem() to folio_is_secretmem() nios2: fix flush_dcache_page() for usage from irq context hugetlb: add documentation for vma_kernel_pagesize() mm: add orphaned kernel-doc to the rst files. mm: fix clean_record_shared_mapping_range kernel-doc mm: fix get_mctgt_type() kernel-doc mm: fix kernel-doc warning from tlb_flush_rmaps() mm: remove enum page_entry_size mm: allow ->huge_fault() to be called without the mmap_lock held mm: move PMD_ORDER to pgtable.h mm: remove checks for pte_index memcg: remove duplication detection for mem_cgroup_uncharge_swap mm/huge_memory: work on folio->swap instead of page->private when splitting folio mm/swap: inline folio_set_swap_entry() and folio_swap_entry() mm/swap: use dedicated entry for swap in folio mm/swap: stop using page->private on tail pages for THP_SWAP selftests/mm: fix WARNING comparing pointer to 0 selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check ...
2 parents 651a00b + 52ae298 commit b96a3e9

File tree

471 files changed

+9558
-7074
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

471 files changed

+9558
-7074
lines changed

Documentation/ABI/testing/sysfs-kernel-mm-damon

Lines changed: 36 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,10 @@ Description: Writing 'on' or 'off' to this file makes the kdamond starts or
2929
file updates contents of schemes stats files of the kdamond.
3030
Writing 'update_schemes_tried_regions' to the file updates
3131
contents of 'tried_regions' directory of every scheme directory
32-
of this kdamond. Writing 'clear_schemes_tried_regions' to the
33-
file removes contents of the 'tried_regions' directory.
32+
of this kdamond. Writing 'update_schemes_tried_bytes' to the
33+
file updates only '.../tried_regions/total_bytes' files of this
34+
kdamond. Writing 'clear_schemes_tried_regions' to the file
35+
removes contents of the 'tried_regions' directory.
3436

3537
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/pid
3638
Date: Mar 2022
@@ -269,8 +271,10 @@ What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/
269271
Date: Dec 2022
270272
Contact: SeongJae Park <sj@kernel.org>
271273
Description: Writing to and reading from this file sets and gets the type of
272-
the memory of the interest. 'anon' for anonymous pages, or
273-
'memcg' for specific memory cgroup can be written and read.
274+
the memory of the interest. 'anon' for anonymous pages,
275+
'memcg' for specific memory cgroup, 'addr' for address range
276+
(an open-ended interval), or 'target' for DAMON monitoring
277+
target can be written and read.
274278

275279
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/memcg_path
276280
Date: Dec 2022
@@ -279,6 +283,27 @@ Description: If 'memcg' is written to the 'type' file, writing to and
279283
reading from this file sets and gets the path to the memory
280284
cgroup of the interest.
281285

286+
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/addr_start
287+
Date: Jul 2023
288+
Contact: SeongJae Park <sj@kernel.org>
289+
Description: If 'addr' is written to the 'type' file, writing to or reading
290+
from this file sets or gets the start address of the address
291+
range for the filter.
292+
293+
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/addr_end
294+
Date: Jul 2023
295+
Contact: SeongJae Park <sj@kernel.org>
296+
Description: If 'addr' is written to the 'type' file, writing to or reading
297+
from this file sets or gets the end address of the address
298+
range for the filter.
299+
300+
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/target_idx
301+
Date: Dec 2022
302+
Contact: SeongJae Park <sj@kernel.org>
303+
Description: If 'target' is written to the 'type' file, writing to or
304+
reading from this file sets or gets the index of the DAMON
305+
monitoring target of the interest.
306+
282307
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/matching
283308
Date: Dec 2022
284309
Contact: SeongJae Park <sj@kernel.org>
@@ -317,6 +342,13 @@ Contact: SeongJae Park <sj@kernel.org>
317342
Description: Reading this file returns the number of the exceed events of
318343
the scheme's quotas.
319344

345+
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/tried_regions/total_bytes
346+
Date: Jul 2023
347+
Contact: SeongJae Park <sj@kernel.org>
348+
Description: Reading this file returns the total amount of memory that
349+
corresponding DAMON-based Operation Scheme's action has tried
350+
to be applied.
351+
320352
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/tried_regions/<R>/start
321353
Date: Oct 2022
322354
Contact: SeongJae Park <sj@kernel.org>

Documentation/ABI/testing/sysfs-memory-page-offline

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Description:
1010
dropping it if possible. The kernel will then be placed
1111
on the bad page list and never be reused.
1212

13-
The offlining is done in kernel specific granuality.
13+
The offlining is done in kernel specific granularity.
1414
Normally it's the base page size of the kernel, but
1515
this might change.
1616

@@ -35,7 +35,7 @@ Description:
3535
to access this page assuming it's poisoned by the
3636
hardware.
3737

38-
The offlining is done in kernel specific granuality.
38+
The offlining is done in kernel specific granularity.
3939
Normally it's the base page size of the kernel, but
4040
this might change.
4141

Documentation/admin-guide/cgroup-v1/memory.rst

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -92,8 +92,6 @@ Brief summary of control files.
9292
memory.oom_control set/show oom controls.
9393
memory.numa_stat show the number of memory usage per numa
9494
node
95-
memory.kmem.limit_in_bytes This knob is deprecated and writing to
96-
it will return -ENOTSUPP.
9795
memory.kmem.usage_in_bytes show current kernel memory allocation
9896
memory.kmem.failcnt show the number of kernel memory usage
9997
hits limits

Documentation/admin-guide/kdump/vmcoreinfo.rst

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -141,8 +141,8 @@ nodemask_t
141141
The size of a nodemask_t type. Used to compute the number of online
142142
nodes.
143143

144-
(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|compound_order|compound_head)
145-
-------------------------------------------------------------------------------------------------
144+
(page, flags|_refcount|mapping|lru|_mapcount|private|compound_order|compound_head)
145+
----------------------------------------------------------------------------------
146146

147147
User-space tools compute their values based on the offset of these
148148
variables. The variables are used when excluding unnecessary pages.
@@ -325,8 +325,8 @@ NR_FREE_PAGES
325325
On linux-2.6.21 or later, the number of free pages is in
326326
vm_stat[NR_FREE_PAGES]. Used to get the number of free pages.
327327

328-
PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask
329-
------------------------------------------------------------------------------
328+
PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask|PG_hugetlb
329+
-----------------------------------------------------------------------------------------
330330

331331
Page attributes. These flags are used to filter various unnecessary for
332332
dumping pages.
@@ -338,12 +338,6 @@ More page attributes. These flags are used to filter various unnecessary for
338338
dumping pages.
339339

340340

341-
HUGETLB_PAGE_DTOR
342-
-----------------
343-
344-
The HUGETLB_PAGE_DTOR flag denotes hugetlbfs pages. Makedumpfile
345-
excludes these pages.
346-
347341
x86_64
348342
======
349343

Documentation/admin-guide/mm/damon/usage.rst

Lines changed: 50 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ comma (","). ::
8787
│ │ │ │ │ │ │ filters/nr_filters
8888
│ │ │ │ │ │ │ │ 0/type,matching,memcg_id
8989
│ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
90-
│ │ │ │ │ │ │ tried_regions/
90+
│ │ │ │ │ │ │ tried_regions/total_bytes
9191
│ │ │ │ │ │ │ │ 0/start,end,nr_accesses,age
9292
│ │ │ │ │ │ │ │ ...
9393
│ │ │ │ │ │ ...
@@ -127,14 +127,18 @@ in the state. Writing ``commit`` to the ``state`` file makes kdamond reads the
127127
user inputs in the sysfs files except ``state`` file again. Writing
128128
``update_schemes_stats`` to ``state`` file updates the contents of stats files
129129
for each DAMON-based operation scheme of the kdamond. For details of the
130-
stats, please refer to :ref:`stats section <sysfs_schemes_stats>`. Writing
131-
``update_schemes_tried_regions`` to ``state`` file updates the DAMON-based
132-
operation scheme action tried regions directory for each DAMON-based operation
133-
scheme of the kdamond. Writing ``clear_schemes_tried_regions`` to ``state``
134-
file clears the DAMON-based operating scheme action tried regions directory for
135-
each DAMON-based operation scheme of the kdamond. For details of the
136-
DAMON-based operation scheme action tried regions directory, please refer to
137-
:ref:`tried_regions section <sysfs_schemes_tried_regions>`.
130+
stats, please refer to :ref:`stats section <sysfs_schemes_stats>`.
131+
132+
Writing ``update_schemes_tried_regions`` to ``state`` file updates the
133+
DAMON-based operation scheme action tried regions directory for each
134+
DAMON-based operation scheme of the kdamond. Writing
135+
``update_schemes_tried_bytes`` to ``state`` file updates only
136+
``.../tried_regions/total_bytes`` files. Writing
137+
``clear_schemes_tried_regions`` to ``state`` file clears the DAMON-based
138+
operating scheme action tried regions directory for each DAMON-based operation
139+
scheme of the kdamond. For details of the DAMON-based operation scheme action
140+
tried regions directory, please refer to :ref:`tried_regions section
141+
<sysfs_schemes_tried_regions>`.
138142

139143
If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread.
140144

@@ -359,15 +363,21 @@ number (``N``) to the file creates the number of child directories named ``0``
359363
to ``N-1``. Each directory represents each filter. The filters are evaluated
360364
in the numeric order.
361365

362-
Each filter directory contains three files, namely ``type``, ``matcing``, and
363-
``memcg_path``. You can write one of two special keywords, ``anon`` for
364-
anonymous pages, or ``memcg`` for specific memory cgroup filtering. In case of
365-
the memory cgroup filtering, you can specify the memory cgroup of the interest
366-
by writing the path of the memory cgroup from the cgroups mount point to
367-
``memcg_path`` file. You can write ``Y`` or ``N`` to ``matching`` file to
368-
filter out pages that does or does not match to the type, respectively. Then,
369-
the scheme's action will not be applied to the pages that specified to be
370-
filtered out.
366+
Each filter directory contains six files, namely ``type``, ``matcing``,
367+
``memcg_path``, ``addr_start``, ``addr_end``, and ``target_idx``. To ``type``
368+
file, you can write one of four special keywords: ``anon`` for anonymous pages,
369+
``memcg`` for specific memory cgroup, ``addr`` for specific address range (an
370+
open-ended interval), or ``target`` for specific DAMON monitoring target
371+
filtering. In case of the memory cgroup filtering, you can specify the memory
372+
cgroup of the interest by writing the path of the memory cgroup from the
373+
cgroups mount point to ``memcg_path`` file. In case of the address range
374+
filtering, you can specify the start and end address of the range to
375+
``addr_start`` and ``addr_end`` files, respectively. For the DAMON monitoring
376+
target filtering, you can specify the index of the target between the list of
377+
the DAMON context's monitoring targets list to ``target_idx`` file. You can
378+
write ``Y`` or ``N`` to ``matching`` file to filter out pages that does or does
379+
not match to the type, respectively. Then, the scheme's action will not be
380+
applied to the pages that specified to be filtered out.
371381

372382
For example, below restricts a DAMOS action to be applied to only non-anonymous
373383
pages of all memory cgroups except ``/having_care_already``.::
@@ -381,8 +391,14 @@ pages of all memory cgroups except ``/having_care_already``.::
381391
echo /having_care_already > 1/memcg_path
382392
echo N > 1/matching
383393

384-
Note that filters are currently supported only when ``paddr``
385-
`implementation <sysfs_contexts>` is being used.
394+
Note that ``anon`` and ``memcg`` filters are currently supported only when
395+
``paddr`` `implementation <sysfs_contexts>` is being used.
396+
397+
Also, memory regions that are filtered out by ``addr`` or ``target`` filters
398+
are not counted as the scheme has tried to those, while regions that filtered
399+
out by other type filters are counted as the scheme has tried to. The
400+
difference is applied to :ref:`stats <damos_stats>` and
401+
:ref:`tried regions <sysfs_schemes_tried_regions>`.
386402

387403
.. _sysfs_schemes_stats:
388404

@@ -406,13 +422,21 @@ stats by writing a special keyword, ``update_schemes_stats`` to the relevant
406422
schemes/<N>/tried_regions/
407423
--------------------------
408424

425+
This directory initially has one file, ``total_bytes``.
426+
409427
When a special keyword, ``update_schemes_tried_regions``, is written to the
410-
relevant ``kdamonds/<N>/state`` file, DAMON creates directories named integer
411-
starting from ``0`` under this directory. Each directory contains files
412-
exposing detailed information about each of the memory region that the
413-
corresponding scheme's ``action`` has tried to be applied under this directory,
414-
during next :ref:`aggregation interval <sysfs_monitoring_attrs>`. The
415-
information includes address range, ``nr_accesses``, and ``age`` of the region.
428+
relevant ``kdamonds/<N>/state`` file, DAMON updates the ``total_bytes`` file so
429+
that reading it returns the total size of the scheme tried regions, and creates
430+
directories named integer starting from ``0`` under this directory. Each
431+
directory contains files exposing detailed information about each of the memory
432+
region that the corresponding scheme's ``action`` has tried to be applied under
433+
this directory, during next :ref:`aggregation interval
434+
<sysfs_monitoring_attrs>`. The information includes address range,
435+
``nr_accesses``, and ``age`` of the region.
436+
437+
Writing ``update_schemes_tried_bytes`` to the relevant ``kdamonds/<N>/state``
438+
file will only update the ``total_bytes`` file, and will not create the
439+
subdirectories.
416440

417441
The directories will be removed when another special keyword,
418442
``clear_schemes_tried_regions``, is written to the relevant

Documentation/admin-guide/mm/ksm.rst

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,8 @@ The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``:
159159

160160
general_profit
161161
how effective is KSM. The calculation is explained below.
162+
pages_scanned
163+
how many pages are being scanned for ksm
162164
pages_shared
163165
how many shared pages are being used
164166
pages_sharing
@@ -173,6 +175,13 @@ stable_node_chains
173175
the number of KSM pages that hit the ``max_page_sharing`` limit
174176
stable_node_dups
175177
number of duplicated KSM pages
178+
ksm_zero_pages
179+
how many zero pages that are still mapped into processes were mapped by
180+
KSM when deduplicating.
181+
182+
When ``use_zero_pages`` is/was enabled, the sum of ``pages_sharing`` +
183+
``ksm_zero_pages`` represents the actual number of pages saved by KSM.
184+
if ``use_zero_pages`` has never been enabled, ``ksm_zero_pages`` is 0.
176185

177186
A high ratio of ``pages_sharing`` to ``pages_shared`` indicates good
178187
sharing, but a high ratio of ``pages_unshared`` to ``pages_sharing``
@@ -196,21 +205,25 @@ several times, which are unprofitable memory consumed.
196205
1) How to determine whether KSM save memory or consume memory in system-wide
197206
range? Here is a simple approximate calculation for reference::
198207

199-
general_profit =~ pages_sharing * sizeof(page) - (all_rmap_items) *
208+
general_profit =~ ksm_saved_pages * sizeof(page) - (all_rmap_items) *
200209
sizeof(rmap_item);
201210

202-
where all_rmap_items can be easily obtained by summing ``pages_sharing``,
203-
``pages_shared``, ``pages_unshared`` and ``pages_volatile``.
211+
where ksm_saved_pages equals to the sum of ``pages_sharing`` +
212+
``ksm_zero_pages`` of the system, and all_rmap_items can be easily
213+
obtained by summing ``pages_sharing``, ``pages_shared``, ``pages_unshared``
214+
and ``pages_volatile``.
204215

205216
2) The KSM profit inner a single process can be similarly obtained by the
206217
following approximate calculation::
207218

208-
process_profit =~ ksm_merging_pages * sizeof(page) -
219+
process_profit =~ ksm_saved_pages * sizeof(page) -
209220
ksm_rmap_items * sizeof(rmap_item).
210221

211-
where ksm_merging_pages is shown under the directory ``/proc/<pid>/``,
212-
and ksm_rmap_items is shown in ``/proc/<pid>/ksm_stat``. The process profit
213-
is also shown in ``/proc/<pid>/ksm_stat`` as ksm_process_profit.
222+
where ksm_saved_pages equals to the sum of ``ksm_merging_pages`` and
223+
``ksm_zero_pages``, both of which are shown under the directory
224+
``/proc/<pid>/ksm_stat``, and ksm_rmap_items is also shown in
225+
``/proc/<pid>/ksm_stat``. The process profit is also shown in
226+
``/proc/<pid>/ksm_stat`` as ksm_process_profit.
214227

215228
From the perspective of application, a high ratio of ``ksm_rmap_items`` to
216229
``ksm_merging_pages`` means a bad madvise-applied policy, so developers or

Documentation/admin-guide/mm/memory-hotplug.rst

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -433,6 +433,18 @@ The following module parameters are currently defined:
433433
memory in a way that huge pages in bigger
434434
granularity cannot be formed on hotplugged
435435
memory.
436+
437+
With value "force" it could result in memory
438+
wastage due to memmap size limitations. For
439+
example, if the memmap for a memory block
440+
requires 1 MiB, but the pageblock size is 2
441+
MiB, 1 MiB of hotplugged memory will be wasted.
442+
Note that there are still cases where the
443+
feature cannot be enforced: for example, if the
444+
memmap is smaller than a single page, or if the
445+
architecture does not support the forced mode
446+
in all configurations.
447+
436448
``online_policy`` read-write: Set the basic policy used for
437449
automatic zone selection when onlining memory
438450
blocks without specifying a target zone.
@@ -669,7 +681,7 @@ when still encountering permanently unmovable pages within ZONE_MOVABLE
669681
(-> BUG), memory offlining will keep retrying until it eventually succeeds.
670682

671683
When offlining is triggered from user space, the offlining context can be
672-
terminated by sending a fatal signal. A timeout based offlining can easily be
684+
terminated by sending a signal. A timeout based offlining can easily be
673685
implemented via::
674686

675687
% timeout $TIMEOUT offline_block | failure_handling

Documentation/admin-guide/mm/userfaultfd.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -244,6 +244,21 @@ write-protected (so future writes will also result in a WP fault). These ioctls
244244
support a mode flag (``UFFDIO_COPY_MODE_WP`` or ``UFFDIO_CONTINUE_MODE_WP``
245245
respectively) to configure the mapping this way.
246246

247+
Memory Poisioning Emulation
248+
---------------------------
249+
250+
In response to a fault (either missing or minor), an action userspace can
251+
take to "resolve" it is to issue a ``UFFDIO_POISON``. This will cause any
252+
future faulters to either get a SIGBUS, or in KVM's case the guest will
253+
receive an MCE as if there were hardware memory poisoning.
254+
255+
This is used to emulate hardware memory poisoning. Imagine a VM running on a
256+
machine which experiences a real hardware memory error. Later, we live migrate
257+
the VM to another physical machine. Since we want the migration to be
258+
transparent to the guest, we want that same address range to act as if it was
259+
still poisoned, even though it's on a new physical host which ostensibly
260+
doesn't have a memory error in the exact same spot.
261+
247262
QEMU/KVM
248263
========
249264

Documentation/admin-guide/mm/zswap.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ compressed pool.
4949
Design
5050
======
5151

52-
Zswap receives pages for compression through the Frontswap API and is able to
52+
Zswap receives pages for compression from the swap subsystem and is able to
5353
evict pages from its own compressed pool on an LRU basis and write them back to
5454
the backing swap device in the case that the compressed pool is full.
5555

@@ -70,19 +70,19 @@ means the compression ratio will always be 2:1 or worse (because of half-full
7070
zbud pages). The zsmalloc type zpool has a more complex compressed page
7171
storage method, and it can achieve greater storage densities.
7272

73-
When a swap page is passed from frontswap to zswap, zswap maintains a mapping
73+
When a swap page is passed from swapout to zswap, zswap maintains a mapping
7474
of the swap entry, a combination of the swap type and swap offset, to the zpool
7575
handle that references that compressed swap page. This mapping is achieved
7676
with a red-black tree per swap type. The swap offset is the search key for the
7777
tree nodes.
7878

79-
During a page fault on a PTE that is a swap entry, frontswap calls the zswap
80-
load function to decompress the page into the page allocated by the page fault
81-
handler.
79+
During a page fault on a PTE that is a swap entry, the swapin code calls the
80+
zswap load function to decompress the page into the page allocated by the page
81+
fault handler.
8282

8383
Once there are no PTEs referencing a swap page stored in zswap (i.e. the count
84-
in the swap_map goes to 0) the swap code calls the zswap invalidate function,
85-
via frontswap, to free the compressed entry.
84+
in the swap_map goes to 0) the swap code calls the zswap invalidate function
85+
to free the compressed entry.
8686

8787
Zswap seeks to be simple in its policies. Sysfs attributes allow for one user
8888
controlled policy:

Documentation/block/biovecs.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,7 @@ Usage of helpers:
134134
bio_for_each_bvec_all()
135135
bio_first_bvec_all()
136136
bio_first_page_all()
137+
bio_first_folio_all()
137138
bio_last_bvec_all()
138139

139140
* The following helpers iterate over single-page segment. The passed 'struct

0 commit comments

Comments
 (0)