An adaptive language-agnostic pruning method for greener language models for code

Abstract

Language models of code have demonstrated state-of-the-art performance across various software engineering and source code analysis tasks. However, their demanding computational resource requirements and consequential environmental footprint remain as significant challenges. This work introduces ALPINE, an adaptive programming language-agnostic pruning technique designed to substantially reduce the computational overhead of these models. The proposed method offers a pluggable layer that can be integrated with all Transformer-based models. With ALPINE, input sequences undergo adaptive compression throughout the pipeline, reaching a size that is $\times 3$ less their initial size, resulting in significantly reduced computational load. Our experiments on two software engineering tasks, defect prediction and code clone detection across three language models CodeBERT, GraphCodeBERT and UniXCoder show that ALPINE achieves up to a 50% reduction in FLOPs, a 58.1% decrease in memory footprint, and a 28.1% improvement in throughput on average. Importantly, it achieves the reduction in computation resources while maintaining up to 98.1% of the original predictive performance. These findings highlight the potential of ALPINE in making language models of code more resource-efficient and accessible while preserving their performance, contributing to the overall sustainability of adopting language models in software development.

Folder structure

pruning: folder that contains the RoBERTa-based model that supports pruning, including the training script.
- AttenPruner.py: contains the implementation of IQPruner layer. It takes as input the attention probabilities of each attention head and, final output of the MHA and the attention mask. First, it takes the mean across all heads and tokens to obtain a score distribution. Then, it creates a new mask that indicate the tok
- PrunableEncoderLayer.py: Contains the class PrunableEncoderLayer that implements the forward function of the Transformer layer.
- PrunableModel.py: Inherits from polp.nn.models.roberta.encoder.RoBERTaEncoder, and overrides the layers attribute with PrunableEncoderLayer.
- classifier.py: Contains a classification head on top of PrunableModel for the defect prediction task.
- bcb_classifier.py: Contains a classification head on top of PrunableModel for the code clone detection task.
- utils.py: Holds the implementation of utility functions. Notably, the repack_tensor_and_create_mask function (referred to RepackTensor in Algorithm 2 in the paper), that completely removes tokens from the output of the MHA, or mereges them.
extracted_data:
- bcb: Contains the average sequence length when progressing throughout the layers of the pruned models fine-tuned on the BigCloneBenchmark dataset.
- devign: Contains the average sequence length when progressing throughout the layers of the pruned models fine-tuned on the Devign dataset.
- gpu_mem_cons.xlsx: Excel file that contains the results of memory consumption of all models across the three tasks.
- impact_token_merging.xlsx: Results of the ablation study that investigates the impact of token merging.
- running_time.xlsx: Duration of fine-tuning each model on each task when using ALPINE versus when pruning is not used.
notebooks:
- GPU_Memory_Consumption.ipynb: Jupyter notebook used to draw the barplot for GPU memory consumption.
- running_time.ipynb: Jupyter notebook used to draw the barplot for fine-tuning time.
- theoretical_flops_analysis.ipynb: This is used to plot the FLOPs counting formulae of the MHA and FFNN layers.
- visualize_sequences.ipynb: Plots the sequence lengths progression for pruned and non-pruned models. It uses the data located in the extract_data folder.
scripts:
- train_*.py: Training script for defect prediction and code clone detection.
- speed_test.py: Calculates the throughput of a model using using the test set.
- flops_analysis.py: Calculates the number of FLOPs of a model.

Dependencies

This work uses polp, a library for source code intelligence. Currently, it is under development, but it does include all models that were used in this study.

pip3 install -e lib

The remaining dependencies are installed using,

pip3 install -r requirements.txt

Running and Analysis

To fine-tune the models using the different pruning strategies, execute the following command:

python3 train.py \
	--do_train \
	--epoch 5 \
	--train_batch_size 32 \
	--eval_batch_size 64 \
	--learning_rate 2e-5 \
	--max_grad_norm 1.0 \
	--evaluate_during_training \
	--seed 123456 \
	--alpha 1. \
	--layers {PRUNING-STRAT} \
	--model_name {MODEL} \

The argument layers specifies the layers where ALPINE will be added.

Value	Description
`none`	No layer will have ALPINE.
`all`	All layers will have ALPINE enabled.
`odd`	Only odd-indexed layers will have ALPINE.
`even`	Only even-indexed layers will have ALPINE.

The model_name argument specifies the language model of code to be fine-tuned. We use model weights that are available on the Huggingface Hub.

Value	Description
`microsoft/codebert-base`	CodeBERT.
`microsoft/graphcodebert-base`	GraphCodeBERT.
`microsoft/unixcoder-base`	UniXCoder.

Once the fine-tuning process is finished, a checkpoint of the model where it has performed the best across the epochs will be saved. This checkpoint file will be used for the scripts below.

Both flops_analysis.py and speed_test.py share the same set of arguments,

python3 {flops_analysis.py|speed_test.py} \
        --checkpoint ... \
        --task {TASK} \
        --model_name {MODEL} \
        --eval_batch_size 1 \

The task argument specifies which SE was the model fine-tuned on

Value	Description
`defect_pred`	Defect prediction (Devign dataset).
`code_clone`	Code clone detection (BigCloneBenchmark dataset).

The model_name argument is the same as the one mentioned above for fine-tuning instructions.

Results

NB: Even index = [0, 2, 4, 6, 8, 10]
Odd index = [1, 3, 5, 7, 9, 11]

Computational Costs and Impact of Accuracy

Defect Prediction (Devign Dataset)

Code Clone Detection (BigCloneBenchmark Dataset)

Token Reduction

Defect Prediction (Devign Dataset)

Code Clone Detection (BigCloneBenchmark Dataset)

Memory Footprint

Role of GPU and Impact of Carbon Emission

Fine-tuning times

NVIDIA A100

NVIDIA RTX2080

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
extracted_data		extracted_data
figures		figures
lib		lib
notebooks		notebooks
pruning		pruning
scripts		scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

An adaptive language-agnostic pruning method for greener language models for code

Abstract

Folder structure

Dependencies

Running and Analysis

Results

Computational Costs and Impact of Accuracy

Defect Prediction (Devign Dataset)

Code Clone Detection (BigCloneBenchmark Dataset)

Token Reduction

Defect Prediction (Devign Dataset)

Code Clone Detection (BigCloneBenchmark Dataset)

Memory Footprint

Role of GPU and Impact of Carbon Emission

Fine-tuning times

CO2 Emissions Reduction

About

Uh oh!

Releases

Packages

Languages

License

SMART-Dal/FSE25-Alpine-Replication-Package

Folders and files

Latest commit

History

Repository files navigation

An adaptive language-agnostic pruning method for greener language models for code

Abstract

Folder structure

Dependencies

Running and Analysis

Results

Computational Costs and Impact of Accuracy

Defect Prediction (Devign Dataset)

Code Clone Detection (BigCloneBenchmark Dataset)

Token Reduction

Defect Prediction (Devign Dataset)

Code Clone Detection (BigCloneBenchmark Dataset)

Memory Footprint

Role of GPU and Impact of Carbon Emission

Fine-tuning times

CO2 Emissions Reduction

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages