This project aims to demonstrate an advanced example of how dynamic binary translator works by lifting RISC-V RV64I user-space programs at runtime to LLVM IR followed by compiling and executing the code via LLVM's OrCJIT natively on the host system.
The runtime execution starts by taking a statically linked RV64I ELF binary which is
then loaded via the ElfLoader
which is responsible for parsing the ELF file and loading
the executable segments into the allocated guest memory space.
The guest memory space is a 4GiB virtual memory region allocated using mmap
, serving as
the heap for the guest RISC-V program. Memory protections (mprotect
) are dynamically applied
based on ELF segment flags.
With the segments in memory we run a instruction decoding pass RISC-V instruction word and builds
it into a structured C++ type (e.g., RType
, IType
, UType
, etc.), extracting opcode, register
and immediate fields. It uses std::variant
to represent different instruction formats.
The core translation engine is implemented in the Lifter
class it takes decoded
RISC-V instructions and builds equivalent LLVM IR, this is done at the basic block granularity.
The translation process roughly represents the module as a sort of emulator with the guest RISC-V registers
modelled as alloca
instructions in LLVM IR.
Once all instructions are lifted the in-memory representation of the translated guest code is built into
an LLVM module which is then passed to LLVM's ExecutionEngine
that compiles to module into native machine
code for the host architecture at runtime. Finally the code is then executed.
The decoder is implemented in src/Decoder.hpp
and src/Decoder.cpp
. It uses bitwise operations to extract fields from 32-bit instruction words based on the RV64I specification. Instruction types are represented by distinct struct
s (e.g., RType
, IType
) within src/Instruction.hpp
, and std::variant
is used in the decode
method to return the appropriate instruction type.
The Memory
class (src/Memory.hpp
, src/Memory.cpp
) utilizes mmap
to reserve a large virtual memory block. This block serves as the guest's address space. mprotect
is used to set read, write, and execute permissions on memory regions as required by the loaded ELF segments.
The ElfLoader
class (src/ElfLoader.hpp
, src/ElfLoader.cpp
) is a minimal implementation for parsing ELF64 files. It reads the ELF header (Elf64_Ehdr
) and program headers (Elf64_Phdr
), validates the ELF magic number, class (64-bit), data encoding (little-endian), type (executable), and machine (RISC-V). Loadable segments are then copied from the ELF file into the guest memory, and their permissions are set.
The Lifter
class (src/Lifter.hpp
, src/Lifter.cpp
) is responsible for the core translation. It takes an LLVM LLVMContext
, Module
, and IRBuilder
as input. RISC-V general-purpose registers are mapped to alloca
instructions in the LLVM IR, allowing them to be manipulated using load
and store
instructions. The special behavior of x0
(always zero, writes ignored) is handled during the load
and store
IR generation. Currently, it supports basic arithmetic (ADD
, ADDI
) and LUI
instructions.
src/main.cpp
demonstrates the JIT execution flow. It initializes LLVM's native target, creates an LLVMContext
and Module
, and then uses the Lifter
to translate sample instructions. The generated Module
is then passed to an llvm::ExecutionEngine
(specifically MCJIT
), which compiles the IR to native code. A function pointer to the JITted code is obtained and executed. This allows the translated RISC-V code to run directly on the host CPU.
To build the project, ensure you have CMake and LLVM (version 20) installed. Then, follow these steps:
mkdir build
cd build
cmake ..
cmake --build .
After building, you can run the example in main.cpp
:
./build/bt
This will print the generated LLVM IR and the result of the JIT execution.
To run the unit tests:
cd build
ctest