KNighter is an innovative checker synthesis tool that leverages Large Language Models (LLMs) to automatically generate static analysis checkers from historical patch commits.
- 🤖 LLM-Powered Generation: Automatically synthesizes static analysis checkers using state-of-the-art language models
- 📊 Multi-step Pipeline: Employs a sophisticated generation → refinement → triage workflow for high-quality results
- 🔍 Historical Learning: Learns from real-world patch commits to understand common bug patterns
- ⚡ LLVM Integration: Built on top of LLVM for robust static analysis capabilities
- 🐧 Linux Kernel Focus: Specialized for finding bugs in large-scale C/C++ codebases like the Linux kernel
The detected bugs 🐛 can be found here.
Important
We are continuously improving the documentation and adding new features. Please stay tuned for updates.
🐳 Docker Installation Options
docker pull knighterhub/knighter
git clone https://github.com/ise-uiuc/KNighter.git KNighter
cd KNighter
docker build -t knighter .
🚀 Running the Container
# Pull from Docker Hub
docker run -it knighterhub/knighter
# Build from source
docker run -it knighter
⚙️ Environment Initialization
When running the container for the first time, initialize the environment:
cd /app
# This would take a while to download the dependencies and compile the LLVM
python3 scripts/init_docker.py
This downloads LLVM and Linux kernel source code into /data/llvm
and /data/linux
.
API Key Configuration:
echo 'openai_key: "YOUR_OPENAI_API_KEY"' > /app/llm_keys.yaml
Note: For detailed setup steps, refer to
scripts/init_docker.py
which contains the complete initialization process.
🔧 Manual Installation Steps
Step 1: Install Dependencies
Download and build LLVM-18.1.8:
wget https://github.com/llvm/llvm-project/archive/refs/tags/llvmorg-18.1.8.zip
unzip llvmorg-18.1.8.zip
Git clone the Linux kernel source code:
git clone https://github.com/torvalds/linux.git
Install Python dependencies:
# Option 1: Using uv (recommended for faster installs)
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.cargo/env
uv pip install -r requirements.txt
# Option 2: Using regular pip
pip3 install -r requirements.txt
git submodule update --init --recursive
Step 2: Configuration Files
Set up your config.yaml
(see scripts/init_docker.py
for reference):
result_dir: "result-checkers"
LLVM_dir: "/PATH/TO/LLVM_DIR"
checker_nums: 10
linux_dir: "/PATH/TO/LINUX_DIR"
key_file: "llm_keys.yaml"
model: "o3-mini"
Set up the llm_keys.yaml
file (see llm_keys_example.yaml
for reference):
openai_key: "sk-..."
claude_key: "sk-ant-..."
google_key: "AIza..."
deepseek_key: "sk-..."
# For local models (optional)
# In config, use "local:model_name" format to use local models
# Like "local:openai/gpt-oss-120b"
base_url: "http://localhost:8000/v1"
api_key: "dummy"
Step 3: LLVM Setup
python3 scripts/setup_llvm.py LLVM_PATH
For rapid evaluation, use the debug dataset:
cd /app/src
# Step 1: Generate checkers for debug commits
python3 main.py gen --config_file /app/config-generate.yaml --commit_file=/app/commits/commits-debug.txt
# Step 2: Refine generated checkers
python3 main.py refine --config_file /app/config-refine-debug.yaml /app/result-generate
# Step 3: Triage and analyze results
python3 main.py triage --config_file /app/config-triage-debug.yaml /app/result-refine-debug
📋 Pipeline Modes & Usage
Available Operation Modes:
Mode | Purpose | Description |
---|---|---|
gen |
Generation | Generate new checkers from commit patches |
refine |
Refinement | Improve and validate generated checkers |
scan |
Scanning | Scan the kernel with validated checkers |
triage |
Analysis | Analyze and categorize scan results |
Basic Usage (Manual Setup):
cd src
python3 main.py <mode> --commit_file=<commits.txt> --config_file=<config.yaml>
Example:
python3 main.py gen --commit_file=../commits/commits-selected.txt --config_file=config.yaml
⚙️ Configuration Files
File | Purpose | Key Parameters |
---|---|---|
config-generate.yaml |
Checker generation | model , checker_nums , result_dir |
config-refine.yaml |
Refinement process | jobs , scan_timeout , scan_commit |
config-triage.yaml |
Result analysis | Analysis parameters |
Modify these files to experiment with different parameters from the paper evaluation.
🏗️ System Architecture Overview
KNighter implements a multi-stage pipeline for automated checker synthesis:
- Commit Analysis: Extract bug patterns from historical patches
- Checker Generation: Use LLMs to synthesize static analysis checkers
- Refinement: Validate and improve generated checkers through compilation and testing
- Deployment: Apply refined checkers to target codebases
- Triage: Analyze and categorize detected issues
For comprehensive architecture documentation, see ARCHITECTURE.md
.
Citation: If you use KNighter in your research, please cite our paper:
@inproceedings{knighter,
title = {KNighter: Transforming Static Analysis with LLM-Synthesized Checkers},
author = {Yang, Chenyuan and Zhao, Zijie and Xie, Zichen and Li, Haoyu and Zhang, Lingming},
year = {2025},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3731569.3764827},
doi = {10.1145/3731569.3764827},
booktitle = {Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles},
location = {Seoul, Republic of Korea},
series = {SOSP '25}
}