Skip to content
This repository was archived by the owner on Nov 8, 2023. It is now read-only.

Commit 0b32d43

Browse files
committed
Merge tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more mm updates from Andrew Morton: "Jeff Xu's implementation of the mseal() syscall" * tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: selftest mm/mseal read-only elf memory segment mseal: add documentation selftest mm/mseal memory sealing mseal: add mseal syscall mseal: wire up mseal syscall
2 parents f1f9984 + a52b4f1 commit 0b32d43

File tree

33 files changed

+2732
-3
lines changed

33 files changed

+2732
-3
lines changed

Documentation/userspace-api/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ System calls
2020
futex2
2121
ebpf/index
2222
ioctl/index
23+
mseal
2324

2425
Security-related interfaces
2526
===========================

Documentation/userspace-api/mseal.rst

Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
=====================
4+
Introduction of mseal
5+
=====================
6+
7+
:Author: Jeff Xu <jeffxu@chromium.org>
8+
9+
Modern CPUs support memory permissions such as RW and NX bits. The memory
10+
permission feature improves security stance on memory corruption bugs, i.e.
11+
the attacker can’t just write to arbitrary memory and point the code to it,
12+
the memory has to be marked with X bit, or else an exception will happen.
13+
14+
Memory sealing additionally protects the mapping itself against
15+
modifications. This is useful to mitigate memory corruption issues where a
16+
corrupted pointer is passed to a memory management system. For example,
17+
such an attacker primitive can break control-flow integrity guarantees
18+
since read-only memory that is supposed to be trusted can become writable
19+
or .text pages can get remapped. Memory sealing can automatically be
20+
applied by the runtime loader to seal .text and .rodata pages and
21+
applications can additionally seal security critical data at runtime.
22+
23+
A similar feature already exists in the XNU kernel with the
24+
VM_FLAGS_PERMANENT flag [1] and on OpenBSD with the mimmutable syscall [2].
25+
26+
User API
27+
========
28+
mseal()
29+
-----------
30+
The mseal() syscall has the following signature:
31+
32+
``int mseal(void addr, size_t len, unsigned long flags)``
33+
34+
**addr/len**: virtual memory address range.
35+
36+
The address range set by ``addr``/``len`` must meet:
37+
- The start address must be in an allocated VMA.
38+
- The start address must be page aligned.
39+
- The end address (``addr`` + ``len``) must be in an allocated VMA.
40+
- no gap (unallocated memory) between start and end address.
41+
42+
The ``len`` will be paged aligned implicitly by the kernel.
43+
44+
**flags**: reserved for future use.
45+
46+
**return values**:
47+
48+
- ``0``: Success.
49+
50+
- ``-EINVAL``:
51+
- Invalid input ``flags``.
52+
- The start address (``addr``) is not page aligned.
53+
- Address range (``addr`` + ``len``) overflow.
54+
55+
- ``-ENOMEM``:
56+
- The start address (``addr``) is not allocated.
57+
- The end address (``addr`` + ``len``) is not allocated.
58+
- A gap (unallocated memory) between start and end address.
59+
60+
- ``-EPERM``:
61+
- sealing is supported only on 64-bit CPUs, 32-bit is not supported.
62+
63+
- For above error cases, users can expect the given memory range is
64+
unmodified, i.e. no partial update.
65+
66+
- There might be other internal errors/cases not listed here, e.g.
67+
error during merging/splitting VMAs, or the process reaching the max
68+
number of supported VMAs. In those cases, partial updates to the given
69+
memory range could happen. However, those cases should be rare.
70+
71+
**Blocked operations after sealing**:
72+
Unmapping, moving to another location, and shrinking the size,
73+
via munmap() and mremap(), can leave an empty space, therefore
74+
can be replaced with a VMA with a new set of attributes.
75+
76+
Moving or expanding a different VMA into the current location,
77+
via mremap().
78+
79+
Modifying a VMA via mmap(MAP_FIXED).
80+
81+
Size expansion, via mremap(), does not appear to pose any
82+
specific risks to sealed VMAs. It is included anyway because
83+
the use case is unclear. In any case, users can rely on
84+
merging to expand a sealed VMA.
85+
86+
mprotect() and pkey_mprotect().
87+
88+
Some destructive madvice() behaviors (e.g. MADV_DONTNEED)
89+
for anonymous memory, when users don't have write permission to the
90+
memory. Those behaviors can alter region contents by discarding pages,
91+
effectively a memset(0) for anonymous memory.
92+
93+
Kernel will return -EPERM for blocked operations.
94+
95+
For blocked operations, one can expect the given address is unmodified,
96+
i.e. no partial update. Note, this is different from existing mm
97+
system call behaviors, where partial updates are made till an error is
98+
found and returned to userspace. To give an example:
99+
100+
Assume following code sequence:
101+
102+
- ptr = mmap(null, 8192, PROT_NONE);
103+
- munmap(ptr + 4096, 4096);
104+
- ret1 = mprotect(ptr, 8192, PROT_READ);
105+
- mseal(ptr, 4096);
106+
- ret2 = mprotect(ptr, 8192, PROT_NONE);
107+
108+
ret1 will be -ENOMEM, the page from ptr is updated to PROT_READ.
109+
110+
ret2 will be -EPERM, the page remains to be PROT_READ.
111+
112+
**Note**:
113+
114+
- mseal() only works on 64-bit CPUs, not 32-bit CPU.
115+
116+
- users can call mseal() multiple times, mseal() on an already sealed memory
117+
is a no-action (not error).
118+
119+
- munseal() is not supported.
120+
121+
Use cases:
122+
==========
123+
- glibc:
124+
The dynamic linker, during loading ELF executables, can apply sealing to
125+
non-writable memory segments.
126+
127+
- Chrome browser: protect some security sensitive data-structures.
128+
129+
Notes on which memory to seal:
130+
==============================
131+
132+
It might be important to note that sealing changes the lifetime of a mapping,
133+
i.e. the sealed mapping won’t be unmapped till the process terminates or the
134+
exec system call is invoked. Applications can apply sealing to any virtual
135+
memory region from userspace, but it is crucial to thoroughly analyze the
136+
mapping's lifetime prior to apply the sealing.
137+
138+
For example:
139+
140+
- aio/shm
141+
142+
aio/shm can call mmap()/munmap() on behalf of userspace, e.g. ksys_shmdt() in
143+
shm.c. The lifetime of those mapping are not tied to the lifetime of the
144+
process. If those memories are sealed from userspace, then munmap() will fail,
145+
causing leaks in VMA address space during the lifetime of the process.
146+
147+
- Brk (heap)
148+
149+
Currently, userspace applications can seal parts of the heap by calling
150+
malloc() and mseal().
151+
let's assume following calls from user space:
152+
153+
- ptr = malloc(size);
154+
- mprotect(ptr, size, RO);
155+
- mseal(ptr, size);
156+
- free(ptr);
157+
158+
Technically, before mseal() is added, the user can change the protection of
159+
the heap by calling mprotect(RO). As long as the user changes the protection
160+
back to RW before free(), the memory range can be reused.
161+
162+
Adding mseal() into the picture, however, the heap is then sealed partially,
163+
the user can still free it, but the memory remains to be RO. If the address
164+
is re-used by the heap manager for another malloc, the process might crash
165+
soon after. Therefore, it is important not to apply sealing to any memory
166+
that might get recycled.
167+
168+
Furthermore, even if the application never calls the free() for the ptr,
169+
the heap manager may invoke the brk system call to shrink the size of the
170+
heap. In the kernel, the brk-shrink will call munmap(). Consequently,
171+
depending on the location of the ptr, the outcome of brk-shrink is
172+
nondeterministic.
173+
174+
175+
Additional notes:
176+
=================
177+
As Jann Horn pointed out in [3], there are still a few ways to write
178+
to RO memory, which is, in a way, by design. Those cases are not covered
179+
by mseal(). If applications want to block such cases, sandbox tools (such as
180+
seccomp, LSM, etc) might be considered.
181+
182+
Those cases are:
183+
184+
- Write to read-only memory through /proc/self/mem interface.
185+
- Write to read-only memory through ptrace (such as PTRACE_POKETEXT).
186+
- userfaultfd.
187+
188+
The idea that inspired this patch comes from Stephen Röttger’s work in V8
189+
CFI [4]. Chrome browser in ChromeOS will be the first user of this API.
190+
191+
Reference:
192+
==========
193+
[1] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b9f69dbd3c8c3fd30a/osfmk/mach/vm_statistics.h#L274
194+
195+
[2] https://man.openbsd.org/mimmutable.2
196+
197+
[3] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426FkcgnfUGLvA@mail.gmail.com
198+
199+
[4] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXgeaRHo/edit#heading=h.bvaojj9fu6hc

arch/alpha/kernel/syscalls/syscall.tbl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -501,3 +501,4 @@
501501
569 common lsm_get_self_attr sys_lsm_get_self_attr
502502
570 common lsm_set_self_attr sys_lsm_set_self_attr
503503
571 common lsm_list_modules sys_lsm_list_modules
504+
572 common mseal sys_mseal

arch/arm/tools/syscall.tbl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -475,3 +475,4 @@
475475
459 common lsm_get_self_attr sys_lsm_get_self_attr
476476
460 common lsm_set_self_attr sys_lsm_set_self_attr
477477
461 common lsm_list_modules sys_lsm_list_modules
478+
462 common mseal sys_mseal

arch/arm64/include/asm/unistd.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@
3939
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
4040
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)
4141

42-
#define __NR_compat_syscalls 462
42+
#define __NR_compat_syscalls 463
4343
#endif
4444

4545
#define __ARCH_WANT_SYS_CLONE

arch/arm64/include/asm/unistd32.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -929,6 +929,8 @@ __SYSCALL(__NR_lsm_get_self_attr, sys_lsm_get_self_attr)
929929
__SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr)
930930
#define __NR_lsm_list_modules 461
931931
__SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules)
932+
#define __NR_mseal 462
933+
__SYSCALL(__NR_mseal, sys_mseal)
932934

933935
/*
934936
* Please add new compat syscalls above this comment and update

arch/m68k/kernel/syscalls/syscall.tbl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -461,3 +461,4 @@
461461
459 common lsm_get_self_attr sys_lsm_get_self_attr
462462
460 common lsm_set_self_attr sys_lsm_set_self_attr
463463
461 common lsm_list_modules sys_lsm_list_modules
464+
462 common mseal sys_mseal

arch/microblaze/kernel/syscalls/syscall.tbl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -467,3 +467,4 @@
467467
459 common lsm_get_self_attr sys_lsm_get_self_attr
468468
460 common lsm_set_self_attr sys_lsm_set_self_attr
469469
461 common lsm_list_modules sys_lsm_list_modules
470+
462 common mseal sys_mseal

arch/mips/kernel/syscalls/syscall_n32.tbl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -400,3 +400,4 @@
400400
459 n32 lsm_get_self_attr sys_lsm_get_self_attr
401401
460 n32 lsm_set_self_attr sys_lsm_set_self_attr
402402
461 n32 lsm_list_modules sys_lsm_list_modules
403+
462 n32 mseal sys_mseal

arch/mips/kernel/syscalls/syscall_n64.tbl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -376,3 +376,4 @@
376376
459 n64 lsm_get_self_attr sys_lsm_get_self_attr
377377
460 n64 lsm_set_self_attr sys_lsm_set_self_attr
378378
461 n64 lsm_list_modules sys_lsm_list_modules
379+
462 n64 mseal sys_mseal

0 commit comments

Comments
 (0)