Skip to content

Commit df57721

Browse files
committed
Merge tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 shadow stack support from Dave Hansen: "This is the long awaited x86 shadow stack support, part of Intel's Control-flow Enforcement Technology (CET). CET consists of two related security features: shadow stacks and indirect branch tracking. This series implements just the shadow stack part of this feature, and just for userspace. The main use case for shadow stack is providing protection against return oriented programming attacks. It works by maintaining a secondary (shadow) stack using a special memory type that has protections against modification. When executing a CALL instruction, the processor pushes the return address to both the normal stack and to the special permission shadow stack. Upon RET, the processor pops the shadow stack copy and compares it to the normal stack copy. For more information, refer to the links below for the earlier versions of this patch set" Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/ Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/ * tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits) x86/shstk: Change order of __user in type x86/ibt: Convert IBT selftest to asm x86/shstk: Don't retry vm_munmap() on -EINTR x86/kbuild: Fix Documentation/ reference x86/shstk: Move arch detail comment out of core mm x86/shstk: Add ARCH_SHSTK_STATUS x86/shstk: Add ARCH_SHSTK_UNLOCK x86: Add PTRACE interface for shadow stack selftests/x86: Add shadow stack test x86/cpufeatures: Enable CET CR4 bit for shadow stack x86/shstk: Wire in shadow stack interface x86: Expose thread features in /proc/$PID/status x86/shstk: Support WRSS for userspace x86/shstk: Introduce map_shadow_stack syscall x86/shstk: Check that signal frame is shadow stack mem x86/shstk: Check that SSP is aligned on sigreturn x86/shstk: Handle signals for shadow stack x86/shstk: Introduce routines modifying shstk x86/shstk: Handle thread shadow stack x86/shstk: Add user-mode shadow stack support ...
2 parents b97d64c + 1fe428d commit df57721

File tree

118 files changed

+2790
-308
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

118 files changed

+2790
-308
lines changed

Documentation/arch/x86/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ x86-specific Documentation
2222
mtrr
2323
pat
2424
intel-hfi
25+
shstk
2526
iommu
2627
intel_txt
2728
amd-memory-encryption

Documentation/arch/x86/shstk.rst

Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
======================================================
4+
Control-flow Enforcement Technology (CET) Shadow Stack
5+
======================================================
6+
7+
CET Background
8+
==============
9+
10+
Control-flow Enforcement Technology (CET) covers several related x86 processor
11+
features that provide protection against control flow hijacking attacks. CET
12+
can protect both applications and the kernel.
13+
14+
CET introduces shadow stack and indirect branch tracking (IBT). A shadow stack
15+
is a secondary stack allocated from memory which cannot be directly modified by
16+
applications. When executing a CALL instruction, the processor pushes the
17+
return address to both the normal stack and the shadow stack. Upon
18+
function return, the processor pops the shadow stack copy and compares it
19+
to the normal stack copy. If the two differ, the processor raises a
20+
control-protection fault. IBT verifies indirect CALL/JMP targets are intended
21+
as marked by the compiler with 'ENDBR' opcodes. Not all CPU's have both Shadow
22+
Stack and Indirect Branch Tracking. Today in the 64-bit kernel, only userspace
23+
shadow stack and kernel IBT are supported.
24+
25+
Requirements to use Shadow Stack
26+
================================
27+
28+
To use userspace shadow stack you need HW that supports it, a kernel
29+
configured with it and userspace libraries compiled with it.
30+
31+
The kernel Kconfig option is X86_USER_SHADOW_STACK. When compiled in, shadow
32+
stacks can be disabled at runtime with the kernel parameter: nousershstk.
33+
34+
To build a user shadow stack enabled kernel, Binutils v2.29 or LLVM v6 or later
35+
are required.
36+
37+
At run time, /proc/cpuinfo shows CET features if the processor supports
38+
CET. "user_shstk" means that userspace shadow stack is supported on the current
39+
kernel and HW.
40+
41+
Application Enabling
42+
====================
43+
44+
An application's CET capability is marked in its ELF note and can be verified
45+
from readelf/llvm-readelf output::
46+
47+
readelf -n <application> | grep -a SHSTK
48+
properties: x86 feature: SHSTK
49+
50+
The kernel does not process these applications markers directly. Applications
51+
or loaders must enable CET features using the interface described in section 4.
52+
Typically this would be done in dynamic loader or static runtime objects, as is
53+
the case in GLIBC.
54+
55+
Enabling arch_prctl()'s
56+
=======================
57+
58+
Elf features should be enabled by the loader using the below arch_prctl's. They
59+
are only supported in 64 bit user applications. These operate on the features
60+
on a per-thread basis. The enablement status is inherited on clone, so if the
61+
feature is enabled on the first thread, it will propagate to all the thread's
62+
in an app.
63+
64+
arch_prctl(ARCH_SHSTK_ENABLE, unsigned long feature)
65+
Enable a single feature specified in 'feature'. Can only operate on
66+
one feature at a time.
67+
68+
arch_prctl(ARCH_SHSTK_DISABLE, unsigned long feature)
69+
Disable a single feature specified in 'feature'. Can only operate on
70+
one feature at a time.
71+
72+
arch_prctl(ARCH_SHSTK_LOCK, unsigned long features)
73+
Lock in features at their current enabled or disabled status. 'features'
74+
is a mask of all features to lock. All bits set are processed, unset bits
75+
are ignored. The mask is ORed with the existing value. So any feature bits
76+
set here cannot be enabled or disabled afterwards.
77+
78+
arch_prctl(ARCH_SHSTK_UNLOCK, unsigned long features)
79+
Unlock features. 'features' is a mask of all features to unlock. All
80+
bits set are processed, unset bits are ignored. Only works via ptrace.
81+
82+
arch_prctl(ARCH_SHSTK_STATUS, unsigned long addr)
83+
Copy the currently enabled features to the address passed in addr. The
84+
features are described using the bits passed into the others in
85+
'features'.
86+
87+
The return values are as follows. On success, return 0. On error, errno can
88+
be::
89+
90+
-EPERM if any of the passed feature are locked.
91+
-ENOTSUPP if the feature is not supported by the hardware or
92+
kernel.
93+
-EINVAL arguments (non existing feature, etc)
94+
-EFAULT if could not copy information back to userspace
95+
96+
The feature's bits supported are::
97+
98+
ARCH_SHSTK_SHSTK - Shadow stack
99+
ARCH_SHSTK_WRSS - WRSS
100+
101+
Currently shadow stack and WRSS are supported via this interface. WRSS
102+
can only be enabled with shadow stack, and is automatically disabled
103+
if shadow stack is disabled.
104+
105+
Proc Status
106+
===========
107+
To check if an application is actually running with shadow stack, the
108+
user can read the /proc/$PID/status. It will report "wrss" or "shstk"
109+
depending on what is enabled. The lines look like this::
110+
111+
x86_Thread_features: shstk wrss
112+
x86_Thread_features_locked: shstk wrss
113+
114+
Implementation of the Shadow Stack
115+
==================================
116+
117+
Shadow Stack Size
118+
-----------------
119+
120+
A task's shadow stack is allocated from memory to a fixed size of
121+
MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to
122+
the maximum size of the normal stack, but capped to 4 GB. In the case
123+
of the clone3 syscall, there is a stack size passed in and shadow stack
124+
uses this instead of the rlimit.
125+
126+
Signal
127+
------
128+
129+
The main program and its signal handlers use the same shadow stack. Because
130+
the shadow stack stores only return addresses, a large shadow stack covers
131+
the condition that both the program stack and the signal alternate stack run
132+
out.
133+
134+
When a signal happens, the old pre-signal state is pushed on the stack. When
135+
shadow stack is enabled, the shadow stack specific state is pushed onto the
136+
shadow stack. Today this is only the old SSP (shadow stack pointer), pushed
137+
in a special format with bit 63 set. On sigreturn this old SSP token is
138+
verified and restored by the kernel. The kernel will also push the normal
139+
restorer address to the shadow stack to help userspace avoid a shadow stack
140+
violation on the sigreturn path that goes through the restorer.
141+
142+
So the shadow stack signal frame format is as follows::
143+
144+
|1...old SSP| - Pointer to old pre-signal ssp in sigframe token format
145+
(bit 63 set to 1)
146+
| ...| - Other state may be added in the future
147+
148+
149+
32 bit ABI signals are not supported in shadow stack processes. Linux prevents
150+
32 bit execution while shadow stack is enabled by the allocating shadow stacks
151+
outside of the 32 bit address space. When execution enters 32 bit mode, either
152+
via far call or returning to userspace, a #GP is generated by the hardware
153+
which, will be delivered to the process as a segfault. When transitioning to
154+
userspace the register's state will be as if the userspace ip being returned to
155+
caused the segfault.
156+
157+
Fork
158+
----
159+
160+
The shadow stack's vma has VM_SHADOW_STACK flag set; its PTEs are required
161+
to be read-only and dirty. When a shadow stack PTE is not RO and dirty, a
162+
shadow access triggers a page fault with the shadow stack access bit set
163+
in the page fault error code.
164+
165+
When a task forks a child, its shadow stack PTEs are copied and both the
166+
parent's and the child's shadow stack PTEs are cleared of the dirty bit.
167+
Upon the next shadow stack access, the resulting shadow stack page fault
168+
is handled by page copy/re-use.
169+
170+
When a pthread child is created, the kernel allocates a new shadow stack
171+
for the new thread. New shadow stack creation behaves like mmap() with respect
172+
to ASLR behavior. Similarly, on thread exit the thread's shadow stack is
173+
disabled.
174+
175+
Exec
176+
----
177+
178+
On exec, shadow stack features are disabled by the kernel. At which point,
179+
userspace can choose to re-enable, or lock them.

Documentation/filesystems/proc.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -566,6 +566,7 @@ encoded manner. The codes are the following:
566566
mt arm64 MTE allocation tags are enabled
567567
um userfaultfd missing tracking
568568
uw userfaultfd wr-protect tracking
569+
ss shadow stack page
569570
== =======================================
570571

571572
Note that there is no guarantee that every flag and associated mnemonic will

Documentation/mm/arch_pgtable_helpers.rst

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,11 @@ PTE Page Table Helpers
4646
+---------------------------+--------------------------------------------------+
4747
| pte_mkclean | Creates a clean PTE |
4848
+---------------------------+--------------------------------------------------+
49-
| pte_mkwrite | Creates a writable PTE |
49+
| pte_mkwrite | Creates a writable PTE of the type specified by |
50+
| | the VMA. |
51+
+---------------------------+--------------------------------------------------+
52+
| pte_mkwrite_novma | Creates a writable PTE, of the conventional type |
53+
| | of writable. |
5054
+---------------------------+--------------------------------------------------+
5155
| pte_wrprotect | Creates a write protected PTE |
5256
+---------------------------+--------------------------------------------------+
@@ -118,7 +122,11 @@ PMD Page Table Helpers
118122
+---------------------------+--------------------------------------------------+
119123
| pmd_mkclean | Creates a clean PMD |
120124
+---------------------------+--------------------------------------------------+
121-
| pmd_mkwrite | Creates a writable PMD |
125+
| pmd_mkwrite | Creates a writable PMD of the type specified by |
126+
| | the VMA. |
127+
+---------------------------+--------------------------------------------------+
128+
| pmd_mkwrite_novma | Creates a writable PMD, of the conventional type |
129+
| | of writable. |
122130
+---------------------------+--------------------------------------------------+
123131
| pmd_wrprotect | Creates a write protected PMD |
124132
+---------------------------+--------------------------------------------------+

arch/Kconfig

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -931,6 +931,14 @@ config HAVE_ARCH_HUGE_VMALLOC
931931
config ARCH_WANT_HUGE_PMD_SHARE
932932
bool
933933

934+
# Archs that want to use pmd_mkwrite on kernel memory need it defined even
935+
# if there are no userspace memory management features that use it
936+
config ARCH_WANT_KERNEL_PMD_MKWRITE
937+
bool
938+
939+
config ARCH_WANT_PMD_MKWRITE
940+
def_bool TRANSPARENT_HUGEPAGE || ARCH_WANT_KERNEL_PMD_MKWRITE
941+
934942
config HAVE_ARCH_SOFT_DIRTY
935943
bool
936944

arch/alpha/include/asm/pgtable.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -256,7 +256,7 @@ extern inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED;
256256
extern inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) |= _PAGE_FOW; return pte; }
257257
extern inline pte_t pte_mkclean(pte_t pte) { pte_val(pte) &= ~(__DIRTY_BITS); return pte; }
258258
extern inline pte_t pte_mkold(pte_t pte) { pte_val(pte) &= ~(__ACCESS_BITS); return pte; }
259-
extern inline pte_t pte_mkwrite(pte_t pte) { pte_val(pte) &= ~_PAGE_FOW; return pte; }
259+
extern inline pte_t pte_mkwrite_novma(pte_t pte){ pte_val(pte) &= ~_PAGE_FOW; return pte; }
260260
extern inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= __DIRTY_BITS; return pte; }
261261
extern inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= __ACCESS_BITS; return pte; }
262262

arch/arc/include/asm/hugepage.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ static inline pmd_t pte_pmd(pte_t pte)
2121
}
2222

2323
#define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd)))
24-
#define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd)))
24+
#define pmd_mkwrite_novma(pmd) pte_pmd(pte_mkwrite_novma(pmd_pte(pmd)))
2525
#define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd)))
2626
#define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd)))
2727
#define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd)))

arch/arc/include/asm/pgtable-bits-arcv2.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@
8787

8888
PTE_BIT_FUNC(mknotpresent, &= ~(_PAGE_PRESENT));
8989
PTE_BIT_FUNC(wrprotect, &= ~(_PAGE_WRITE));
90-
PTE_BIT_FUNC(mkwrite, |= (_PAGE_WRITE));
90+
PTE_BIT_FUNC(mkwrite_novma, |= (_PAGE_WRITE));
9191
PTE_BIT_FUNC(mkclean, &= ~(_PAGE_DIRTY));
9292
PTE_BIT_FUNC(mkdirty, |= (_PAGE_DIRTY));
9393
PTE_BIT_FUNC(mkold, &= ~(_PAGE_ACCESSED));

arch/arm/include/asm/pgtable-3level.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,7 @@ static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; }
202202

203203
PMD_BIT_FUNC(wrprotect, |= L_PMD_SECT_RDONLY);
204204
PMD_BIT_FUNC(mkold, &= ~PMD_SECT_AF);
205-
PMD_BIT_FUNC(mkwrite, &= ~L_PMD_SECT_RDONLY);
205+
PMD_BIT_FUNC(mkwrite_novma, &= ~L_PMD_SECT_RDONLY);
206206
PMD_BIT_FUNC(mkdirty, |= L_PMD_SECT_DIRTY);
207207
PMD_BIT_FUNC(mkclean, &= ~L_PMD_SECT_DIRTY);
208208
PMD_BIT_FUNC(mkyoung, |= PMD_SECT_AF);

arch/arm/include/asm/pgtable.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,7 @@ static inline pte_t pte_wrprotect(pte_t pte)
228228
return set_pte_bit(pte, __pgprot(L_PTE_RDONLY));
229229
}
230230

231-
static inline pte_t pte_mkwrite(pte_t pte)
231+
static inline pte_t pte_mkwrite_novma(pte_t pte)
232232
{
233233
return clear_pte_bit(pte, __pgprot(L_PTE_RDONLY));
234234
}

0 commit comments

Comments
 (0)