Skip to content

Project show casing how to build dynamic binary translation systems by lifting RISC-V to LLVM IR and using the OrcJIT for runtime execution.

Notifications You must be signed in to change notification settings

clflushopt/llvm-riscv-lifter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dynamic RISC-V Binary Translator (RV64I)

This project aims to demonstrate an advanced example of how dynamic binary translator works by lifting RISC-V RV64I user-space programs at runtime to LLVM IR followed by compiling and executing the code via LLVM's OrCJIT natively on the host system.

Overview

The runtime execution starts by taking a statically linked RV64I ELF binary which is then loaded via the ElfLoader which is responsible for parsing the ELF file and loading the executable segments into the allocated guest memory space.

The guest memory space is a 4GiB virtual memory region allocated using mmap, serving as the heap for the guest RISC-V program. Memory protections (mprotect) are dynamically applied based on ELF segment flags.

With the segments in memory we run a instruction decoding pass RISC-V instruction word and builds it into a structured C++ type (e.g., RType, IType, UType, etc.), extracting opcode, register and immediate fields. It uses std::variant to represent different instruction formats.

The core translation engine is implemented in the Lifter class it takes decoded RISC-V instructions and builds equivalent LLVM IR, this is done at the basic block granularity. The translation process roughly represents the module as a sort of emulator with the guest RISC-V registers modelled as alloca instructions in LLVM IR.

Once all instructions are lifted the in-memory representation of the translated guest code is built into an LLVM module which is then passed to LLVM's ExecutionEngine that compiles to module into native machine code for the host architecture at runtime. Finally the code is then executed.

RISC-V Decoder

The decoder is implemented in src/Decoder.hpp and src/Decoder.cpp. It uses bitwise operations to extract fields from 32-bit instruction words based on the RV64I specification. Instruction types are represented by distinct structs (e.g., RType, IType) within src/Instruction.hpp, and std::variant is used in the decode method to return the appropriate instruction type.

Memory Emulation

The Memory class (src/Memory.hpp, src/Memory.cpp) utilizes mmap to reserve a large virtual memory block. This block serves as the guest's address space. mprotect is used to set read, write, and execute permissions on memory regions as required by the loaded ELF segments.

ELF Loader

The ElfLoader class (src/ElfLoader.hpp, src/ElfLoader.cpp) is a minimal implementation for parsing ELF64 files. It reads the ELF header (Elf64_Ehdr) and program headers (Elf64_Phdr), validates the ELF magic number, class (64-bit), data encoding (little-endian), type (executable), and machine (RISC-V). Loadable segments are then copied from the ELF file into the guest memory, and their permissions are set.

LLVM Lifter

The Lifter class (src/Lifter.hpp, src/Lifter.cpp) is responsible for the core translation. It takes an LLVM LLVMContext, Module, and IRBuilder as input. RISC-V general-purpose registers are mapped to alloca instructions in the LLVM IR, allowing them to be manipulated using load and store instructions. The special behavior of x0 (always zero, writes ignored) is handled during the load and store IR generation. Currently, it supports basic arithmetic (ADD, ADDI) and LUI instructions.

LLVM JIT Execution

src/main.cpp demonstrates the JIT execution flow. It initializes LLVM's native target, creates an LLVMContext and Module, and then uses the Lifter to translate sample instructions. The generated Module is then passed to an llvm::ExecutionEngine (specifically MCJIT), which compiles the IR to native code. A function pointer to the JITted code is obtained and executed. This allows the translated RISC-V code to run directly on the host CPU.

Building the Project

To build the project, ensure you have CMake and LLVM (version 20) installed. Then, follow these steps:

mkdir build
cd build
cmake ..
cmake --build .

Running the Example

After building, you can run the example in main.cpp:

./build/bt

This will print the generated LLVM IR and the result of the JIT execution.

Testing

To run the unit tests:

cd build
ctest

About

Project show casing how to build dynamic binary translation systems by lifting RISC-V to LLVM IR and using the OrcJIT for runtime execution.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published