training low-rank matrices on a pruned model and merging them for inference on the original model.
Jun Zhang1,
Jue Wang1,
Huan Li1,
Lidan Shou1,
Ke Chen1,
Yang You2,
Guiming Xie3,
Xuejian Gong3,
Kunlong Zhou3
1 Zhejiang University, 2 National University of Singapore, 3 OPPO AI Center
✅ Train LoRA on a pruned model to reduce memory footprint
✅ Recover LoRA for high-quality full model inference
Clone the repository and install dependencies:
git clone https://github.com/your-repo/LoRAM.git
cd LoRAM/loram
This project was made possible thanks to a collaboration with:
Shout out to LLM-Pruner and SparseGPT!
LoRAM leverages these tools, and we appreciate their contributions to the research community.
If you find the resources in this repository useful, please cite our paper:
@inproceedings{
zhang2025train,
title={Train Small, Infer Large: Memory-Efficient Lo{RA} Training for Large Language Models},
author={Jun Zhang and Jue WANG and Huan Li and Lidan Shou and Ke Chen and Yang You and Guiming Xie and Xuejian Gong and Kunlong Zhou},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=s7DkcgpRxL}
}