MARVEL

The repo of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE'24) paper "Mutual Learning-Based Framework for Enhancing Robustness of Code Models via Adversarial Training", which contains the codes of our framework on all tasks and models.

Overview

Folder Structure

The folder structure is as follows.

.
├─language_parser
│  └─parser_folder
├─marvel
│  ├─authorshipAttribution
│  │  ├─code
│  │  ├─dataset
│  │  └─script
│  ├─defectPrediction
│  │  ├─code
│  │  ├─dataset
│  │  └─script
│  ├─Java250
│  │  ├─code
│  │  ├─dataset
│  │  └─script
│  ├─Python800
│  │  ├─code
│  │  ├─dataset
│  │  └─script
│  └─vulnerabilityDetection
│      ├─code
│      ├─dataset
│      └─script
└─utils

Under each subject's folder in marvel/ (authorshipAttribution/, defectPrediction/, Java250/, Python800/ and vulnerabilityDetection), there are three folders (code/, dataset/ and script/) and one README.md file. The original dataset and some data processing programs (for generating substitutions of attack algorithms) are stored in the dataset/ directory. The code/ directory contains our MARVEL codes, and attack codes (provided by ALERT and CODA), the script/ directory contains commands for training and attacking.

Dataset and Base Models

The experiments in our artifact use five open-source datasets, which have all been provided in their respective folders. The dataset files can be accessed from the following links: Authorship Attribution, Defect Prediction, Vulnerability Detection, Java250 and Python800.

In our artifact, we have applied MARVEL to three code models - CodeBERT, GraphCodeBERT and UniXCoder - to improve their robustness. We have also used three adversarial attack methods - ALERT, MHM and CODA - to attack these models and evaluate their robustness.

Setup

Operating System and Hardware

The experiments related to CodeBERT and GraphCodeBERT were conducted on an Ubuntu16.04 server with Intel(R) Xeon(R) E5 2683 v4 @2.10GHz CPU, and NVIDIA TITAN RTX GPU, and the experiments related to UniXCoder were conducted on an Ubuntu 20.04 server with Intel(R) Xeon(R) Platinum 8352V CPU@2.10GHz, and NVIDIA RTX 4090 GPU.

We recommend using a similar device, with at least one NVIDIA GPU on the device, and the experiment should be carried out under Linux system.

Environments

We recommend using Anaconda to set up the experimental environment, set up your environment with the following instructions.

conda env create -n marvel -f environment.yaml
conda activate marvel

Build `tree-sitter`

We use tree-sitter to parse code snippets and extract variable names. You need to go to ./language_parser/parser_folder and build tree-sitter using the following commands:

cd ./language_parser/parser_folder
bash build.sh

Experiments

We applied MARVEL to three code models on five datasets and evaluated them using three attack algorithms. You can perform experiments on the corresponding downstream task in the ./marvel.

We can refer to the README.md files under each folder to train models on different tasks. Let's take the CodeBERT model, Authorship Attribution task and the ALERT attack algorithm as examples. First, you need to go to ./marvel/authorshipAttribution:

cd ./marvel/authorshipAttribution

Then make sure the dataset is already in the dataset folder, then go to ./code , run python run_mutual.py to execute the MARVEL framework as follows:

cd ./code
CUDA_VISIBLE_DEVICES=0 python run_mutual.py \
    --do_train \
    --do_test \
    --epochs=30 \
    --model_type codebert \
    --save_name marvel \
    --train_batch_size=8 \
    --eval_batch_size=8 \
    --alpha=0.3 \
    --max_adv_step=3

The trained model is then saved in the ./model/codebert folder, then execute ALERT attack to attack the MARVEL-enhanced model:

CUDA_VISIBLE_DEVICES=0 python attack.py \
    --attack_name=alert \
    --model_type=codebert \
    --eval_batch_size=8 \
    --save_name=marvel

It should be noted that substitutions need to be generated based on the original paper setting of the corresponding attack algorithm before executing the corresponding attack algorithm (e.g., ALERT and CODA). Please go to the various folders under ./marvel for more details :)

Acknowledgement

We are very grateful that the authors of CodeBERT, GraphCodeBERT, UniXCoder, ALERT, MHM, CODA make their code publicly available so that we can build this repository on top of their code.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
language_parser		language_parser
marvel		marvel
utils		utils
ASE24-MARVEL.pdf		ASE24-MARVEL.pdf
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
marvel.png		marvel.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MARVEL

Overview

Folder Structure

Dataset and Base Models

Setup

Operating System and Hardware

Environments

Build `tree-sitter`

Experiments

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

VMnK-Run/MARVEL

Folders and files

Latest commit

History

Repository files navigation

MARVEL

Overview

Folder Structure

Dataset and Base Models

Setup

Operating System and Hardware

Environments

Build tree-sitter

Experiments

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Build `tree-sitter`

Packages