Skip to content
This repository was archived by the owner on Nov 8, 2023. It is now read-only.

Commit 872bb37

Browse files
committed
randomize_kstack: Improve stack alignment codegen
The codgen for adding architecture-specific stack alignment to the effective alloca() usage is somewhat inefficient and allows a bit to get carried beyond the desired entropy range. This isn't really a problem, but it's unexpected and the codegen is kind of bad. Quoting Mark[1], the disassembly for arm64's invoke_syscall() looks like: // offset = raw_cpu_read(kstack_offset) mov x4, sp adrp x0, kstack_offset mrs x5, tpidr_el1 add x0, x0, #:lo12:kstack_offset ldr w0, [x0, x5] // offset = KSTACK_OFFSET_MAX(offset) and x0, x0, #0x3ff // alloca(offset) add x0, x0, #0xf and x0, x0, #0x7f0 sub sp, x4, x0 ... which in C would be: offset = raw_cpu_read(kstack_offset) offset &= 0x3ff; // [0x0, 0x3ff] offset += 0xf; // [0xf, 0x40e] offset &= 0x7f0; // [0x0, ... so when *all* bits [3:0] are 0, they'll have no impact, and when *any* of bits [3:0] are 1 they'll trigger a carry into bit 4, which could ripple all the way up and spill into bit 10. Switch the masking in KSTACK_OFFSET_MAX() to explicitly clear the bottom bits to avoid the rounding by using 0b1111110000 instead of 0b1111111111: // offset = raw_cpu_read(kstack_offset) mov x4, sp adrp x0, 0 <kstack_offset> mrs x5, tpidr_el1 add x0, x0, #:lo12:kstack_offset ldr w0, [x0, x5] // offset = KSTACK_OFFSET_MAX(offset) and x0, x0, #0x3f0 // alloca(offset) sub sp, x4, x0 Suggested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/lkml/ZnVfOnIuFl2kNWkT@J2N7QTR9R3/ [1] Link: https://lore.kernel.org/r/20240702211612.work.576-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
1 parent 3ccea47 commit 872bb37

File tree

1 file changed

+12
-6
lines changed

1 file changed

+12
-6
lines changed

include/linux/randomize_kstack.h

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -32,13 +32,19 @@ DECLARE_PER_CPU(u32, kstack_offset);
3232
#endif
3333

3434
/*
35-
* Use, at most, 10 bits of entropy. We explicitly cap this to keep the
36-
* "VLA" from being unbounded (see above). 10 bits leaves enough room for
37-
* per-arch offset masks to reduce entropy (by removing higher bits, since
38-
* high entropy may overly constrain usable stack space), and for
39-
* compiler/arch-specific stack alignment to remove the lower bits.
35+
* Use, at most, 6 bits of entropy (on 64-bit; 8 on 32-bit). This cap is
36+
* to keep the "VLA" from being unbounded (see above). Additionally clear
37+
* the bottom 4 bits (on 64-bit systems, 2 for 32-bit), since stack
38+
* alignment will always be at least word size. This makes the compiler
39+
* code gen better when it is applying the actual per-arch alignment to
40+
* the final offset. The resulting randomness is reasonable without overly
41+
* constraining usable stack space.
4042
*/
41-
#define KSTACK_OFFSET_MAX(x) ((x) & 0x3FF)
43+
#ifdef CONFIG_64BIT
44+
#define KSTACK_OFFSET_MAX(x) ((x) & 0b1111110000)
45+
#else
46+
#define KSTACK_OFFSET_MAX(x) ((x) & 0b1111111100)
47+
#endif
4248

4349
/**
4450
* add_random_kstack_offset - Increase stack utilization by previously

0 commit comments

Comments
 (0)