RepoAudit is a repo-level bug detector for general bugs. Currently, it supports the detection of diverse bug types (such as Null Pointer Dereference, Memory Leak, and Use After Free) in multiple programming languages (including C/C++, Java, Python, and Go). It leverages LLMSCAN to parse the codebase and uses LLM to mimic the process of manual code auditing. Compared with existing code auditing tools, RepoAudit offers the following advantages:
- 🛡️ Compilation-Free Analysis
- 🌍 Multi-Lingual Support
- 🐞 Multiple Bug Type Detection
- ⚙️ Customization Support
[May 2025] 🎉 Our paper "RepoAudit: Automated Code Auditing with Multi-Agent LLM Framework" has been accepted at ICML 2025! 🏆
[March 2025] RepoAudit has helped identify over 100 bugs in open-source projects this quarter!
RepoAudit is a multi-agent framework for code auditing. We offer two agent instances in our current version:
-
MetaScanAgent in
metascan.py
: Scan the project using tree-sitter–powered parsing-based analyzers and obtains the basic syntactic properties of the program. -
DFBScanAgent in
dfbscan.py
: Perform inter-procedural data-flow analysis as described in this preprint. It detects data-flow bugs, including source-must-not-reach-sink bugs (e.g., Null Pointer Dereference) and source-must-reach-sink bugs (e.g., Memory Leak).
We are keeping implementing more agents and will open-source them very soon. Utilizing DFBScanAgent and other agents, we have discovered hundred of confirmed and fixed bugs in open-source community. You can refer to this bug list.
-
Create and activate a conda environment with Python 3.9.18:
conda create -n repoaudit python=3.9.18 conda activate repoaudit
-
Install the required dependencies:
cd RepoAudit pip install -r requirements.txt
-
Ensure you have the Tree-sitter library and language bindings installed:
cd lib python build.py
-
Configure the OpenAI API key.
export OPENAI_API_KEY=xxxxxx >> ~/.bashrc
For Claude3.5, we use the model hosted by Amazon Bedrock. If you want to use Claude-3.5 and Claude-3.7, you may need to set up the environment first.
-
We have prepared several benchmark programs in the
benchmark
directory for a quick start. Some of these are submodules, so you may need to initialize them using the following commands:cd RepoAudit git submodule update --init --recursive
-
We provide the script
src/run_repoaudit.sh
to scan files in thebenchmark/Java/toy/NPD
directory. You can run the following commands:cd src sh run_repoaudit.sh # Run the agent DFBScanAgent
-
After the scanning is complete, you can check the resulting JSON and log files.
For a large repository, a sequential analysis process may be quite time-consuming. To accelerate the analysis, you can choose parallel auditing. Specifically, you can set the option --max-neural-workers
to a larger value. By default, this option is set to 6 for parallel auditing.
Also, we have set the parsing-based analysis in a parallel mode by default. The default maximal number of workers is 10.
We currently open-source the implementation of dfbscan. We will release more technical reports/research papers and open-source other agents in RepoAudit very soon. For more information, please refer to our website: RepoAudit: Auditing Code As Human.
If you want to know more details about the tool usage, project architecture, and extensions of RepoAudit, please refer to the following documents:
-
User Guide: Detailed instructions on installation, configuration, and usage of RepoAudit, particularly including the instructions on CLI and webUI usage.
-
Tool Architecture: In-depth explanation of RepoAudit's multi-agent framework, including parsing-based analyzer/tools, LLM-driven tools, and the memory designs of the agents.
-
Extension: Guidelines for customizing RepoAudit for new bug types and supporting more programming languages.
This project is licensed under the GNU General Public License v2.0 (GPLv2). You are free to use, modify, and distribute the software under the terms of this license, provided that derivative works are also distributed under the same license.
For full details, see the LICENSE file or visit the official license page: https://www.gnu.org/licenses/old-licenses/gpl-2.0.html