GitHub - VulDetect-llm/CTX-Coder

CTX-Coder: Cross-Attention Architectures Empower LLMs for Long-Context Vulnerability Detection

A Long-Context Enhancing LLM For Vulnerability Detection.
Explore the docs »

Note

The CTX-Coder is a modified version of Llama-3.2-11B-Vision-Instruct We remove the vision encoder and use Llama's last hidden layers.

Quick Install & Setup

# Install
pip3 install -r requirements.txt

Call Graph Data Collection

If you want to collect your own call graph dataset, use the following steps:

Download the github projects into a directory root.
Generate the call graph using the following command:

cd doxygen
bash doxygen.sh

Note: please replace the root directory in doxygen.sh

Extract the function and format into json str using the python script: python extract_doxygen.py. It will output json files.

CTX-VUL

The CTX-Vul dataset is a dataset contains contextual functions of a vulnerable function. We format it in the following json string:

{
    "index_to_funcname": {"0": "<func1_name>", "1": "<func2_name>"},
    "adj": ["# a n*n Matrix of the call relationships, A_{ij} = 1 means the function i is called by j"], 
    "index_to_code": {"0": "<func1_code>", "1": "<func2_code>"},
    "vul_type": "Vulnerable/Not Vulnerable"
}

Note

The 0 function is the target function. The dataset and checkpoint is comming soon!

CTX-Coder

Training

We provide the training scripts in ctx_coder/train_ctxcoder.py, to use this script please fill the MODEL_PATH, LLAMA_3_PATH, and OUTPUT_PATH. You can train the model using the following command:

deepspeed ctx_coder/train_ctxcoder.py

Inference

We provide a pipeline, you can just replace the trained checkpoint and dataset for inference. Using the following command:

python ctx_coder/pipeline.py

Evaluation

To evaluate CTX-Coder, you should generate fisrst the result using pipeline.py. Then evaluate the result using evaluation/test.py.
For code document generation, we use the default dataset of CodeBert and use the official code of Big-Code.
CrossCodeEval: project url https://github.com/amazon-science/cceval.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
ctx_coder		ctx_coder
doxygen		doxygen
evaluation		evaluation
prompts		prompts
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CTX-Coder: Cross-Attention Architectures Empower LLMs for Long-Context Vulnerability Detection

Quick Install & Setup

Call Graph Data Collection

CTX-VUL

CTX-Coder

Training

Inference

Evaluation

About

Uh oh!

Releases

Packages

Languages

VulDetect-llm/CTX-Coder

Folders and files

Latest commit

History

Repository files navigation

CTX-Coder: Cross-Attention Architectures Empower LLMs for Long-Context Vulnerability Detection

Quick Install & Setup

Call Graph Data Collection

CTX-VUL

CTX-Coder

Training

Inference

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages