Recently, Italiano and Cummins introduced an elegant methodology for uncovering performance bugs in compilers. Their approach involves using a pre-trained large language model (LLM) to generate a seed program, followed by successive mutations designed to provoke unexpected behavior, even in mainstream compilers. This methodology is particularly appealing due to its language-agnostic nature: it can be adapted to different programming languages without the need to develop a dedicated fuzzer for each one. Moreover, it has proven highly effective, uncovering previously unknown (zero-day) performance bugs in widely used compilers such as Clang, ICC, and GCC. In an effort to reproduce the results reported by Italiano and Cummins, we confirm that their technique outperforms general-purpose LLMs, such as open-source versions of LLaMA and DeepSeek, in identifying compiler performance bugs. However, we also observe that while the LLM-based approach is commendable, it lags behind tools like CSmith in terms of throughput (the number of bugs found over time) and latency (the time to discover the first bug). LLMs also require significantly greater computational resources. Although this outcome may seem discouraging, it is important to note that we are comparing novel LLMs with a mature language-specific fuzzer. Nevertheless, as technology evolves, we expect the performance of LLM-based fuzzing to improve, potentially surpassing traditional methods in the future.
- Python 3.10 or newer
- uv (Python environment manager): installation guide
- gcc-14 and g++-14
- RAM: At least 16GB recommended
- GPU: Recommended for LLM generation (used via Google Colab Pro)
- OS: Linux (tested on Ubuntu 24.04)
- Fuzzers: Csmith and Yarpgen
This repository contains an artifact of the entire generation and testing process for the article "On the Practicality of LLM-Based Compiler Fuzzing". A technical report describing our methodology and results is available here.
This is a technical report under review. The final version will be linked here when available.
├── assets/ # Images, banners, PDF
├── notebooks/ # LLM code generation notebook
├── src/scripts/ # Used scripts
├── LICENSE # Open source license (MIT)
└── README.md # This file
We use UV as our Python package and project manager. This ensures fast and reproducible environments with minimal configuration.
Follow the official installation instructions: 👉 UV Installation Guide
Install the traditional C code generators used for comparison:
Follow the respective installation instructions in each repository to build and set up the tools.
You may need GCC 14 to compile the generated programs. We recommend installing it via your system's package manager or from source.
On Ubuntu 24.04, you can use:
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt update
sudo apt install gcc-14 g++-14
If your distro don't have gcc-14 on apt repository, you may build gcc 14 from source following the steps below.
apt install libmpfr-dev libgmp3-dev libmpc-dev -y
wget http://ftp.gnu.org/gnu/gcc/gcc-14.1.0/gcc-14.1.0.tar.gz
tar -xf gcc-14.1.0.tar.gz && cd gcc-14.1.0
./configure -v --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --prefix=/usr/local/gcc-14.1.0 --enable-checking=release --enable-languages=c,c++ --disable-multilib --program-suffix=-14
make
make install
export PATH=$PATH:/usr/local/gcc-14.1.0/bin
Make sure gcc-14
and g++-14
are available in your PATH
.
Create and activate a virtual environment with UV:
uv venv
uv pip install -e .
This will install the package in editable mode along with its dependencies.
For the generation process using Large Language Models (LLMs), Google Colab Pro was utilized. This provided access to a GPU, which was essential for the computations. To generate code samples using LLMs, open and run the following notebook step by step - you may need to upload to Google Colab.
notebooks/Gagana.ipynb
Or access the notebook on Google Colab using the following link:
Make sure the notebook dependencies are installed, including ipykernel
if needed:
uv pip install notebook ipykernel
After installing Csmith or Yarpgen, run:
uv run gagana-traditional
This will execute the traditional code generation pipeline. You can pass arguments to configure the generation process as described in the script’s help section:
uv run gagana-traditional --help
-
Traditional Fuzzer Outputs: Automatically generate a
.csv
file with performance metrics and evaluation data after code execution. -
LLM Outputs: To evaluate LLM-generated code, run:
uv run gagana-llm
As with the traditional script, you can pass arguments as needed:
uv run gagana-llm --help
This project is licensed under the MIT License.
Copyright (c) 2025
Departamento de Ciência da Computação — Universidade Federal de Minas Gerais