Skip to content

Commit 4da9f33

Browse files
committed
Merge tag 'x86-fsgsbase-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fsgsbase from Thomas Gleixner: "Support for FSGSBASE. Almost 5 years after the first RFC to support it, this has been brought into a shape which is maintainable and actually works. This final version was done by Sasha Levin who took it up after Intel dropped the ball. Sasha discovered that the SGX (sic!) offerings out there ship rogue kernel modules enabling FSGSBASE behind the kernels back which opens an instantanious unpriviledged root hole. The FSGSBASE instructions provide a considerable speedup of the context switch path and enable user space to write GSBASE without kernel interaction. This enablement requires careful handling of the exception entries which go through the paranoid entry path as they can no longer rely on the assumption that user GSBASE is positive (as enforced via prctl() on non FSGSBASE enabled systemn). All other entries (syscalls, interrupts and exceptions) can still just utilize SWAPGS unconditionally when the entry comes from user space. Converting these entries to use FSGSBASE has no benefit as SWAPGS is only marginally slower than WRGSBASE and locating and retrieving the kernel GSBASE value is not a free operation either. The real benefit of RD/WRGSBASE is the avoidance of the MSR reads and writes. The changes come with appropriate selftests and have held up in field testing against the (sanitized) Graphene-SGX driver" * tag 'x86-fsgsbase-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86/fsgsbase: Fix Xen PV support x86/ptrace: Fix 32-bit PTRACE_SETREGS vs fsbase and gsbase selftests/x86/fsgsbase: Add a missing memory constraint selftests/x86/fsgsbase: Fix a comment in the ptrace_write_gsbase test selftests/x86: Add a syscall_arg_fault_64 test for negative GSBASE selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base write Documentation/x86/64: Add documentation for GS/FS addressing mode x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2 x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit x86/entry/64: Introduce the FIND_PERCPU_BASE macro x86/entry/64: Switch CR3 before SWAPGS in paranoid entry x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation x86/process/64: Use FSGSBASE instructions on thread copy and ptrace x86/process/64: Use FSBSBASE in switch_to() if available x86/process/64: Make save_fsgs_for_kvm() ready for FSGSBASE x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE ...
2 parents 125cfa0 + d029bff commit 4da9f33

File tree

19 files changed

+888
-104
lines changed

19 files changed

+888
-104
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3084,6 +3084,8 @@
30843084
no5lvl [X86-64] Disable 5-level paging mode. Forces
30853085
kernel to use 4-level paging instead.
30863086

3087+
nofsgsbase [X86] Disables FSGSBASE instructions.
3088+
30873089
no_console_suspend
30883090
[HW] Never suspend the console
30893091
Disable suspending of consoles during suspend and

Documentation/x86/x86_64/fsgs.rst

Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
Using FS and GS segments in user space applications
4+
===================================================
5+
6+
The x86 architecture supports segmentation. Instructions which access
7+
memory can use segment register based addressing mode. The following
8+
notation is used to address a byte within a segment:
9+
10+
Segment-register:Byte-address
11+
12+
The segment base address is added to the Byte-address to compute the
13+
resulting virtual address which is accessed. This allows to access multiple
14+
instances of data with the identical Byte-address, i.e. the same code. The
15+
selection of a particular instance is purely based on the base-address in
16+
the segment register.
17+
18+
In 32-bit mode the CPU provides 6 segments, which also support segment
19+
limits. The limits can be used to enforce address space protections.
20+
21+
In 64-bit mode the CS/SS/DS/ES segments are ignored and the base address is
22+
always 0 to provide a full 64bit address space. The FS and GS segments are
23+
still functional in 64-bit mode.
24+
25+
Common FS and GS usage
26+
------------------------------
27+
28+
The FS segment is commonly used to address Thread Local Storage (TLS). FS
29+
is usually managed by runtime code or a threading library. Variables
30+
declared with the '__thread' storage class specifier are instantiated per
31+
thread and the compiler emits the FS: address prefix for accesses to these
32+
variables. Each thread has its own FS base address so common code can be
33+
used without complex address offset calculations to access the per thread
34+
instances. Applications should not use FS for other purposes when they use
35+
runtimes or threading libraries which manage the per thread FS.
36+
37+
The GS segment has no common use and can be used freely by
38+
applications. GCC and Clang support GS based addressing via address space
39+
identifiers.
40+
41+
Reading and writing the FS/GS base address
42+
------------------------------------------
43+
44+
There exist two mechanisms to read and write the FS/GS base address:
45+
46+
- the arch_prctl() system call
47+
48+
- the FSGSBASE instruction family
49+
50+
Accessing FS/GS base with arch_prctl()
51+
--------------------------------------
52+
53+
The arch_prctl(2) based mechanism is available on all 64-bit CPUs and all
54+
kernel versions.
55+
56+
Reading the base:
57+
58+
arch_prctl(ARCH_GET_FS, &fsbase);
59+
arch_prctl(ARCH_GET_GS, &gsbase);
60+
61+
Writing the base:
62+
63+
arch_prctl(ARCH_SET_FS, fsbase);
64+
arch_prctl(ARCH_SET_GS, gsbase);
65+
66+
The ARCH_SET_GS prctl may be disabled depending on kernel configuration
67+
and security settings.
68+
69+
Accessing FS/GS base with the FSGSBASE instructions
70+
---------------------------------------------------
71+
72+
With the Ivy Bridge CPU generation Intel introduced a new set of
73+
instructions to access the FS and GS base registers directly from user
74+
space. These instructions are also supported on AMD Family 17H CPUs. The
75+
following instructions are available:
76+
77+
=============== ===========================
78+
RDFSBASE %reg Read the FS base register
79+
RDGSBASE %reg Read the GS base register
80+
WRFSBASE %reg Write the FS base register
81+
WRGSBASE %reg Write the GS base register
82+
=============== ===========================
83+
84+
The instructions avoid the overhead of the arch_prctl() syscall and allow
85+
more flexible usage of the FS/GS addressing modes in user space
86+
applications. This does not prevent conflicts between threading libraries
87+
and runtimes which utilize FS and applications which want to use it for
88+
their own purpose.
89+
90+
FSGSBASE instructions enablement
91+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
92+
The instructions are enumerated in CPUID leaf 7, bit 0 of EBX. If
93+
available /proc/cpuinfo shows 'fsgsbase' in the flag entry of the CPUs.
94+
95+
The availability of the instructions does not enable them
96+
automatically. The kernel has to enable them explicitly in CR4. The
97+
reason for this is that older kernels make assumptions about the values in
98+
the GS register and enforce them when GS base is set via
99+
arch_prctl(). Allowing user space to write arbitrary values to GS base
100+
would violate these assumptions and cause malfunction.
101+
102+
On kernels which do not enable FSGSBASE the execution of the FSGSBASE
103+
instructions will fault with a #UD exception.
104+
105+
The kernel provides reliable information about the enabled state in the
106+
ELF AUX vector. If the HWCAP2_FSGSBASE bit is set in the AUX vector, the
107+
kernel has FSGSBASE instructions enabled and applications can use them.
108+
The following code example shows how this detection works::
109+
110+
#include <sys/auxv.h>
111+
#include <elf.h>
112+
113+
/* Will be eventually in asm/hwcap.h */
114+
#ifndef HWCAP2_FSGSBASE
115+
#define HWCAP2_FSGSBASE (1 << 1)
116+
#endif
117+
118+
....
119+
120+
unsigned val = getauxval(AT_HWCAP2);
121+
122+
if (val & HWCAP2_FSGSBASE)
123+
printf("FSGSBASE enabled\n");
124+
125+
FSGSBASE instructions compiler support
126+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
127+
128+
GCC version 4.6.4 and newer provide instrinsics for the FSGSBASE
129+
instructions. Clang 5 supports them as well.
130+
131+
=================== ===========================
132+
_readfsbase_u64() Read the FS base register
133+
_readfsbase_u64() Read the GS base register
134+
_writefsbase_u64() Write the FS base register
135+
_writegsbase_u64() Write the GS base register
136+
=================== ===========================
137+
138+
To utilize these instrinsics <immintrin.h> must be included in the source
139+
code and the compiler option -mfsgsbase has to be added.
140+
141+
Compiler support for FS/GS based addressing
142+
-------------------------------------------
143+
144+
GCC version 6 and newer provide support for FS/GS based addressing via
145+
Named Address Spaces. GCC implements the following address space
146+
identifiers for x86:
147+
148+
========= ====================================
149+
__seg_fs Variable is addressed relative to FS
150+
__seg_gs Variable is addressed relative to GS
151+
========= ====================================
152+
153+
The preprocessor symbols __SEG_FS and __SEG_GS are defined when these
154+
address spaces are supported. Code which implements fallback modes should
155+
check whether these symbols are defined. Usage example::
156+
157+
#ifdef __SEG_GS
158+
159+
long data0 = 0;
160+
long data1 = 1;
161+
162+
long __seg_gs *ptr;
163+
164+
/* Check whether FSGSBASE is enabled by the kernel (HWCAP2_FSGSBASE) */
165+
....
166+
167+
/* Set GS base to point to data0 */
168+
_writegsbase_u64(&data0);
169+
170+
/* Access offset 0 of GS */
171+
ptr = 0;
172+
printf("data0 = %ld\n", *ptr);
173+
174+
/* Set GS base to point to data1 */
175+
_writegsbase_u64(&data1);
176+
/* ptr still addresses offset 0! */
177+
printf("data1 = %ld\n", *ptr);
178+
179+
180+
Clang does not provide the GCC address space identifiers, but it provides
181+
address spaces via an attribute based mechanism in Clang 2.6 and newer
182+
versions:
183+
184+
==================================== =====================================
185+
__attribute__((address_space(256)) Variable is addressed relative to GS
186+
__attribute__((address_space(257)) Variable is addressed relative to FS
187+
==================================== =====================================
188+
189+
FS/GS based addressing with inline assembly
190+
-------------------------------------------
191+
192+
In case the compiler does not support address spaces, inline assembly can
193+
be used for FS/GS based addressing mode::
194+
195+
mov %fs:offset, %reg
196+
mov %gs:offset, %reg
197+
198+
mov %reg, %fs:offset
199+
mov %reg, %gs:offset

Documentation/x86/x86_64/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,4 @@ x86_64 Support
1414
fake-numa-for-cpusets
1515
cpu-hotplug-spec
1616
machinecheck
17+
fsgs

arch/x86/entry/calling.h

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
#include <asm/percpu.h>
77
#include <asm/asm-offsets.h>
88
#include <asm/processor-flags.h>
9+
#include <asm/inst.h>
910

1011
/*
1112
@@ -341,6 +342,12 @@ For 32-bit we have the following conventions - kernel is built with
341342
#endif
342343
.endm
343344

345+
.macro SAVE_AND_SET_GSBASE scratch_reg:req save_reg:req
346+
rdgsbase \save_reg
347+
GET_PERCPU_BASE \scratch_reg
348+
wrgsbase \scratch_reg
349+
.endm
350+
344351
#else /* CONFIG_X86_64 */
345352
# undef UNWIND_HINT_IRET_REGS
346353
# define UNWIND_HINT_IRET_REGS
@@ -351,3 +358,36 @@ For 32-bit we have the following conventions - kernel is built with
351358
call stackleak_erase
352359
#endif
353360
.endm
361+
362+
#ifdef CONFIG_SMP
363+
364+
/*
365+
* CPU/node NR is loaded from the limit (size) field of a special segment
366+
* descriptor entry in GDT.
367+
*/
368+
.macro LOAD_CPU_AND_NODE_SEG_LIMIT reg:req
369+
movq $__CPUNODE_SEG, \reg
370+
lsl \reg, \reg
371+
.endm
372+
373+
/*
374+
* Fetch the per-CPU GSBASE value for this processor and put it in @reg.
375+
* We normally use %gs for accessing per-CPU data, but we are setting up
376+
* %gs here and obviously can not use %gs itself to access per-CPU data.
377+
*/
378+
.macro GET_PERCPU_BASE reg:req
379+
ALTERNATIVE \
380+
"LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
381+
"RDPID \reg", \
382+
X86_FEATURE_RDPID
383+
andq $VDSO_CPUNODE_MASK, \reg
384+
movq __per_cpu_offset(, \reg, 8), \reg
385+
.endm
386+
387+
#else
388+
389+
.macro GET_PERCPU_BASE reg:req
390+
movq pcpu_unit_offsets(%rip), \reg
391+
.endm
392+
393+
#endif /* CONFIG_SMP */

0 commit comments

Comments
 (0)