Skip to content

PurCL/RepoAudit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RepoAudit

RepoAudit is a repo-level bug detector for general bugs. Currently, it supports the detection of diverse bug types (such as Null Pointer Dereference, Memory Leak, and Use After Free) in multiple programming languages (including C/C++, Java, Python, and Go). It leverages LLMSCAN to parse the codebase and uses LLM to mimic the process of manual code auditing. Compared with existing code auditing tools, RepoAudit offers the following advantages:

  • 🛡️ Compilation-Free Analysis
  • 🌍 Multi-Lingual Support
  • 🐞 Multiple Bug Type Detection
  • ⚙️ Customization Support

News 📰

[May 2025] 🎉 Our paper "RepoAudit: Automated Code Auditing with Multi-Agent LLM Framework" has been accepted at ICML 2025! 🏆

[March 2025] RepoAudit has helped identify over 100 bugs in open-source projects this quarter!

Agents in RepoAudit

RepoAudit is a multi-agent framework for code auditing. We offer two agent instances in our current version:

  • MetaScanAgent in metascan.py: Scan the project using tree-sitter–powered parsing-based analyzers and obtains the basic syntactic properties of the program.

  • DFBScanAgent in dfbscan.py: Perform inter-procedural data-flow analysis as described in this preprint. It detects data-flow bugs, including source-must-not-reach-sink bugs (e.g., Null Pointer Dereference) and source-must-reach-sink bugs (e.g., Memory Leak).

We are keeping implementing more agents and will open-source them very soon. Utilizing DFBScanAgent and other agents, we have discovered hundred of confirmed and fixed bugs in open-source community. You can refer to this bug list.

Installation

  1. Create and activate a conda environment with Python 3.9.18:

    conda create -n repoaudit python=3.9.18
    conda activate repoaudit
  2. Install the required dependencies:

    cd RepoAudit
    pip install -r requirements.txt
  3. Ensure you have the Tree-sitter library and language bindings installed:

    cd lib
    python build.py
  4. Configure the OpenAI API key.

    export OPENAI_API_KEY=xxxxxx >> ~/.bashrc

    For Claude3.5, we use the model hosted by Amazon Bedrock. If you want to use Claude-3.5 and Claude-3.7, you may need to set up the environment first.

Quick Start

  1. We have prepared several benchmark programs in the benchmark directory for a quick start. Some of these are submodules, so you may need to initialize them using the following commands:

    cd RepoAudit
    git submodule update --init --recursive
  2. We provide the script src/run_repoaudit.sh to scan files in the benchmark/Java/toy/NPD directory. You can run the following commands:

    cd src
    sh run_repoaudit.sh  # Run the agent DFBScanAgent
  3. After the scanning is complete, you can check the resulting JSON and log files.

Parallel Auditing Support

For a large repository, a sequential analysis process may be quite time-consuming. To accelerate the analysis, you can choose parallel auditing. Specifically, you can set the option --max-neural-workers to a larger value. By default, this option is set to 6 for parallel auditing. Also, we have set the parsing-based analysis in a parallel mode by default. The default maximal number of workers is 10.

Website, Paper, and Docs

We currently open-source the implementation of dfbscan. We will release more technical reports/research papers and open-source other agents in RepoAudit very soon. For more information, please refer to our website: RepoAudit: Auditing Code As Human.

If you want to know more details about the tool usage, project architecture, and extensions of RepoAudit, please refer to the following documents:

  • User Guide: Detailed instructions on installation, configuration, and usage of RepoAudit, particularly including the instructions on CLI and webUI usage.

  • Tool Architecture: In-depth explanation of RepoAudit's multi-agent framework, including parsing-based analyzer/tools, LLM-driven tools, and the memory designs of the agents.

  • Extension: Guidelines for customizing RepoAudit for new bug types and supporting more programming languages.

  • DeepWiki: All-in-one doc generated by Devin.

License

This project is licensed under the GNU General Public License v2.0 (GPLv2). You are free to use, modify, and distribute the software under the terms of this license, provided that derivative works are also distributed under the same license.

For full details, see the LICENSE file or visit the official license page: https://www.gnu.org/licenses/old-licenses/gpl-2.0.html

About

An autonomous LLM-agent for large-scale, repository-level code auditing

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published