Okapi: Efficiently Safeguarding Speculative Data Accesses in Sandboxed Environments

Okapi¹ presents a new hardware/software cross-layer architecture designed to mitigate Transient Execution Side Channel (TES) attacks. Okapi provides a hardware basis for secure speculation in sandboxed environments and can replace expensive speculation barriers in software. At its core, Okapi allows for speculative data accesses to a memory page only after the page has been accessed non-speculatively at least once by the current trust domain. The granularity of the trust domains can be controlled in software to achieve different security and performance trade-offs. In our case study, Okapi prevented all Spectre attacks within a WebAssembly runtime, resulting in only 2.34% performance overhead.

OkapiBOOM

This repository contains a microarchitectural implementation of Okapi in the open-spurce out-of-order RISC-V core BOOM.

(1) In the fetch stage every {PC, next PC} pair is compared to detect instructions that cross page boundaries. If the next PC is located on a different page the instruction is marked as suspicious for a possible Spectre-BTB/RSB attack.

(2) The decode stage is extended to handle the OkapiReset and OkapiLoad instructions.

(3) The ROB is modified to keep track of the visibility point, i. e., the youngest instruction in terms of program order that can open a transient window (cf. Sec. 3). Load instructions inside the ROB are marked as unsafe upon their insertion if they are younger than the visibility point and thus not certain to commit. An unsafe load instruction becomes safe, when it is passed by the visibility point, i. e., it is bound to commit. Similarly, loads are marked as suspicious_load if they are younger than an unsafe suspicious instruction. Loads between the visibility point and the first suspicious instruction clear their suspicious_load bit.

(4) The Load Store Queue (LSQ) selectively blocks loads that are marked with the suspicious_load bit from going to the TLB and thus from executing speculatively.

(5) Inside the TLB, safe loads behave as usual and set the safe access bit for the corresponding page upon a successful trans- lation, while speculative and therefore unsafe loads can only be translated if the safe access bit for the corresponding TLB entry is already set. OkapiLoads behave analogously except for never setting the safe access bit. Privilege switches are de- tected by monitoring the CSRs storing the current privilege level. The safe access bits are cleared each time the privilege level changes. In addition, the TLB is modified to reset the safe access bits when the OkapiReset instruction is executed.

(6) The Instruction Queue (IQ) of the processor is extended to reschedule all blocked load instructions. This includes the blocked suspicious_load instructions from the LSQ as well as unsafe loads that did not succeed in the TLB. All loads are rescheduled as soon as they are marked safe within the ROB while suspicious_load instructions can be rescheduled speculatively as soon as the suspicious_load bit is cleared.

(7) The OkapiReset instruction is delayed until it reaches the ROB head, ensuring that all previous loads have been ex- ecuted. The loads following the OkapiReset instruction are de- layed until they become non-speculative or until the OkapiRe- set instruction is executed. This prevents the re-use of stale permissions.

The majority of the described extensions require only minor changes to the pipeline, such as adding single bits to micro-operations or TLB entries. Okapi, to a large extent, re-uses the existing logic in the LSQ and requires no modification to the cache and memory subsystem. Implementing the Okapi instructions and the logic to determine the visibility point are the only significant hardware extensions.

Layout

Synthesis of the BOOMv3 core shows that the front-end dominates the critical path. With both instruction and data cache, as well as parts of the TLB module implemented as SRAM cells, BOOMv3 can easily reach a clock frequency of 500 MHz after synthesis and place & route on a commercial 12 nm FinFET Globalfoundries process for the medium configuration of the core. We used the worst-case corner as the target corner for the timing optimizations with a supply voltage of 0.72 V slow-slow transistor corner and 125 °C temperature. The employed ARM standard cell library uses only very low threshold transistors for minimum dynamic power and maximum performance. The power results are generated in Synopsys PrimePower with typical corner settings (TT, 0.8 V, 25 °C).

The runtime for the synthesis was roughly 6 hours, with additional 2.5 days for place & route and static timing analysis. Our results for BOOMv3 align well with the reported results for BOOMv2. Here, the developers report a tapeout frequency of 594 MHz with optimized place & route setup in a 28 nm process. Running the same process on our modified OkapiBOOM core shows that the additional hardware does not deteriorate the maximum clock frequency as the implemented design features are not on the critical path. Compared to the baseline design, Okapi adds 0.6 % of cell area. It is interesting to mention that the baseline design takes 1.5 % more area than OkapiBOOM after place & route, due to a more complicated placement and wiring. In addition, incorporating the Okapi features reduces the power consumption by 1.5 %. A detailed comparison is listed in the table below. The small differences are all within the tolerance of the commercial tools (noise), and are not a systematic cause of the Okapi extension. Our results clearly show that the additional hardware does not jeopardize the synthesizability of the BOOM design. The resulting layout of OkapiBOOM after place & route is shown in the following figure.

Comparison of Synthesis Results in 12 nm FinFET

	Baseline	OkapiBOOM	Δ (%)
Area*
Combinational Area (GE)	859,219	891,426	+3.8%
Register Area (GE)	1,188,868	1,172,435	-1.4%
Macro Area (GE)	554,294	554,294	0.0%
Total Cell Area (GE)	2,602,465	2,618,088	+0.6%
Total Area (GE)	4,730,557	4,659,064	-1.5%
Power **
Dynamic Power*** (mW)	246.7	242.6	-0.6%
Leakage Power (mW)	0.008	0.008	0.0%

* Gate Equivalents (GE), normalized to single NAND2_X1 size
** at 500 MHz, TT corner (25 °C, VDD = 0.8 V)
*** total power = switching power + internal power

The Berkeley Out-of-Order RISC-V Processor

The Berkeley Out-of-Order Machine (BOOM) is an open-source superscalar out-of-order core, designed and maintained by UC Berkeley Architecture Research. The core's third major release, implements the RV64GC variant of the RISC-V ISA. The current version of the BOOM microarchitecture (SonicBOOM, or BOOMv3) is performance competitive with commercial high-performance out-of-order cores, achieving 6.2 CoreMarks/MHz.

Spectre-style attacks are possible in BOOMv3 and no mitigation has been implemented yet. With respect to Meltdown, measures have been taken in BOOMv2 to prevent this kind of attack, but have been reintroduced to the microarchitecture as we discovered in a previous publication².

Unique Program Execution Checking (UPEC)

UPEC (Unique Program Execution Checking) is a formal approach for detecting vulnerabilities to transient execution attacks in out-of-order processors or verifying their absence. This technique has been developed by the Electronic Design Automation Group at RPTU in Kaiserslautern, Germany.

The verification IP that has been used to verify OkapiBOOM is located in the UPEC_verification directory. A general verification of BOOM against transient execution attacks and further information on the functionality can be found here.

A more detailed description of the employed formal verification technique can be found in:

"An Exhaustive Approach to Detecting Transient Execution Side Channels in RTL Designs of Processors" ³
"A formal approach for detecting vulnerabilities to transient execution attacks in out-of-order processors" ⁴
"Processor hardware security vulnerabilities and their detection by unique program execution checking" ⁵

IMPORTANT: Using BOOM

This repository is NOT A SELF-RUNNING repository. To instantiate a BOOM core, please use the Chipyard SoC generator.

The hash of Chipyard used while carrying out this work is located in the CHIPYARD.hash file in the top level directory of this repository. This file is mainly used for CI purposes, since

While BOOM is primarily ASIC-optimized, it is also usable on FPGAs. Chipyard provides infrastructure and documentation for deploying BOOM on AWS F1 FPGAs through FireSim.

P. Schmitz, T. Jauch, A. Wezel, M. R. Fadiheh, T. Tiemann, J. Heller, T. Eisenbarth, D. Stoffel, W. Kunz: Okapi: Efficiently Safeguarding Speculative Data Accesses in Sandboxed Environments In 2025 ASIA CCS '24: 19th ACM Asia Conference on Computer and Communications Security (to appear) ↩
T. Jauch, A. Wezel, M. R. Fadiheh, P. Schmitz, S. Ray, J. M. Fung, C. W. Fletcher, D. Stoffel, W. Kunz: Secure-by-Construction Design Methodology for CPUs: Implementing Secure Speculation on the RTL. In 2023 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE. ↩
M. R. Fadiheh, A. Wezel, J. Mueller, J. Bormann, S. Ray, J. Fung, S. Mitra, D. Stoffel, W. Kunz: An Exhaustive Approach to Detecting Transient Execution Side Channels in RTL Designs of Processors. In IEEE Transactions on Computers (Volume: 72, Issue: 1), January 2023 ↩
M. R. Fadiheh, J. Müller, R. Brinkmann, S. Mitra, D. Stoffel, W. Kunz: A formal approach for detecting vulnerabilities to transient execution attacks in out-of-order processors. In 2020 57th ACM/IEEE Design Automation Conference (DAC) (pp. 1-6). IEEE. ↩
M. R. Fadiheh, D. Stoffel, C. Barrett, S. Mitra, W. Kunz: Processor hardware security vulnerabilities and their detection by unique program execution checking. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE) (pp. 994-999). IEEE. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
ISA_Tests_Okapi		ISA_Tests_Okapi
RTL		RTL
UPEC_Verification		UPEC_Verification
docs		docs
figures		figures
project		project
src		src
util		util
.gitignore		.gitignore
CHIPYARD.hash		CHIPYARD.hash
CONTRIB_AND_STYLE.md		CONTRIB_AND_STYLE.md
LICENSE		LICENSE
LICENSE.SiFive		LICENSE.SiFive
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
build.sbt		build.sbt
scalastyle-config.xml		scalastyle-config.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

Okapi: Efficiently Safeguarding Speculative Data Accesses in Sandboxed Environments

OkapiBOOM

Layout

Comparison of Synthesis Results in 12 nm FinFET

The Berkeley Out-of-Order RISC-V Processor

Unique Program Execution Checking (UPEC)

IMPORTANT: Using BOOM

About

Licenses found

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Licenses found

RPTU-EIS/OkapiBOOM

Folders and files

Latest commit

History

Repository files navigation

Okapi: Efficiently Safeguarding Speculative Data Accesses in Sandboxed Environments

OkapiBOOM

Layout

Comparison of Synthesis Results in 12 nm FinFET

The Berkeley Out-of-Order RISC-V Processor

Unique Program Execution Checking (UPEC)

IMPORTANT: Using BOOM

Footnotes

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages