VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projections

Roy Miles¹, Pradyumna Reddy¹, Ismail Elezi¹, Jiankang Deng¹

¹ Huawei Noah's Ark Lab

This is the official implementation for reproducing the results in the VeLoRA NeurIPS 2024 paper.

🤗 Planning to integrate into the HuggingFace PEFT library soon

Structure

We provide the training code and weights for the two main sets of experiments - pre-training LLaMA on C4 and fine-tuning LLaMA on Alpaca.

.
├── configs/             # LLaMA architecture configs for the c4 experiments.
├── results/             # By default, all training runs will save results here.
├── data/mmlu/           
├── data/alpaca/         
├── scripts/             # Example scripts for reproducing results.
├── velora/              # All VeLoRA specific code. velora.py ...
├── velora/peft_zoo/     # Implementing other various PEFT methods with VeLoRA.

Installation

Create using conda:

conda create --name velora python=3.10
conda activate velora
pip install -r requirements.txt

Simple example code

Here we give a simple example of using VeLoRA on the value and down projection layers. See set_layers() for more details. We have 32 groups for both the value and down projection and initialise v using an average from the first batch during training. The method = velora+full parameter indicates that no other PEFT method is used with VeLoRA. Finally, the rank parameter will always 1. Using a rank other than 1 would require some other initialisation strategy.

from velora import set_layers

model = LlamaForCausalLM.from_pretrained("/output/path")
method = "velora+full"

velora_config = {
    'rank': 1,
    'num_groups': 32,
    'init_type': 'batch_average_once',
    'layers': 'vd'
}

set_layers(model, method, velora_config)

Pre-training LLaMA on C4

Tuning the num_groups parameter provides a memory/performance trade-off.

Model	Validation Perplexity (↓)	Weights / Log
60M	33.28	link
130M	Coming soon.	Coming soon.

Fine-tuning LLaMA on Alpaca

Model	Mean 5-shot MMLU Test Accuracy (↑)	Weights / Log
7B [3 epochs]	38.6	link
13B	Coming soon.	Coming soon.

Training and Evaluation

We provide some example scripts in scripts/ for both the training and evaluation stages.

Run this command to fine-tune the LLaMA model on the Alpaca dataset :

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python finetune_llama.py \
    --base_model 'huggyllama/llama-7b' \
    --data_path 'tatsu-lab/alpaca' \
    --output_dir './results/lora-alpaca-7b' \
    --batch_size 16 \
    --micro_batch_size 16 \
    --num_epochs 4 \
    --learning_rate 4e-4 \
    --cutoff_len 512 \
    --val_set_size 1024 \
    --lora_r 32 \
    --lora_alpha 16 \
    --lora_dropout 0.1 \
    --lora_target_modules '[down_proj,up_proj,gate_proj,q_proj,k_proj,v_proj,o_proj]' \
    --train_on_inputs \
    --group_by_length \
    --velora_r 1 \
    --velora_layers 'vd' \
    --num_groups 32 \
    --init_type 'batch_average_once' \
    --velora_scale 0.1

Run this command to pre-train the small LLaMA models on the C4 dataset :

torchrun --standalone --nproc_per_node 4 pretrain_llama.py \
    --model_config configs/llama_60m.json \
    --lr 0.01 \
    --velora_r 1 \
    --velora_layers 'vd' \
    --num_groups 64,86 \
    --init_type batch_average_once \
    --peft_type velora+full \
    --batch_size 128 \
    --total_batch_size 512 \
    --num_training_steps 10000 \
    --warmup_steps 1000 \
    --weight_decay 0 \
    --dtype float32 \
    --eval_every 1000 \
    --optimizer velora \
    --velora_scale 1.

✏ Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝.

@inproceedings{miles2024velora,
    title={VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections}, 
    author={Roy Miles and Pradyumna Reddy and Ismail Elezi and Jiankang Deng},
    year={2024},
    journal={NeurIPS}
}

❤️ Acknowledgements

Our codebase is built upon the GaLore and Alpaca-LoRA projects. Great work!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
mmlu		mmlu
pretraining		pretraining
scripts		scripts
templates		templates
utils		utils
velora		velora
.gitignore		.gitignore
DATA_LICENSE		DATA_LICENSE
LICENSE		LICENSE
README.md		README.md
figure.png		figure.png
finetune_llama.py		finetune_llama.py
pretrain_llama.py		pretrain_llama.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projections

Structure

Installation

Simple example code

Pre-training LLaMA on C4

Fine-tuning LLaMA on Alpaca

Training and Evaluation

✏ Citation

❤️ Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

roymiles/VeLoRA

Folders and files

Latest commit

History

Repository files navigation

VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projections

Structure

Installation

Simple example code

Pre-training LLaMA on C4

Fine-tuning LLaMA on Alpaca

Training and Evaluation

✏ Citation

❤️ Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages