Collage: Light-Weight Low-Precision Strategy for LLM Training

Introduction

Collage is a low-precision training strategy for large language models (LLMs). Collage makes use of multi-component floats to reduce the memory footprint during training, particularly for optimization, with purely low-precision (e.g. BFloat16) arithmetic without consorting to Float32.

It is simple to use Collage by replacing AdamW with our AdamW_collage optimizer and use different collage options, i.e., light or plus (check our paper for more details). We provide Collage training for BERT & RoBERTa and also multi-size GPTs with the NeMo Megatron framework.

Requirements

python 3.8 or above
transformers 4.31.0 or above
pytorch 1.13.1 + CUDA 11.7 or above

We recommend using NeMo r1.22.0 with released container nemo:23.11

docker pull nvcr.io/nvidia/nemo:23.11.framework

Datasets

Please follow AWS-Neuron-Tutorials-BERT to download the tokenized wikicorpus file for BERT and RoBERTa

mkdir -p ./examples_datasets/
pushd ./examples_datasets/
aws s3 cp s3://neuron-s3/training_datasets/bert_pretrain_wikicorpus_tokenized_hdf5/bert_pretrain_wikicorpus_tokenized_hdf5_seqlen128.tar .  --no-sign-request
tar -xf bert_pretrain_wikicorpus_tokenized_hdf5_seqlen128.tar
rm bert_pretrain_wikicorpus_tokenized_hdf5_seqlen128.tar
aws s3 cp s3://neuron-s3/training_datasets/bert_pretrain_wikicorpus_tokenized_hdf5/bert_pretrain_wikicorpus_tokenized_hdf5_seqlen512.tar .  --no-sign-request
tar -xf bert_pretrain_wikicorpus_tokenized_hdf5_seqlen512.tar
rm bert_pretrain_wikicorpus_tokenized_hdf5_seqlen512.tar
popd

Please follow AWS-Neuron-Examples-GPT to download the wikipedia dataset that is stored in s3

export DATA_DIR=./examples_datasets/gpt2
mkdir -p ${DATA_DIR} && cd ${DATA_DIR}
wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-vocab.json
wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-merges.txt
aws s3 cp s3://neuron-s3/training_datasets/gpt/wikipedia/my-gpt2_text_document.bin .  --no-sign-request
aws s3 cp s3://neuron-s3/training_datasets/gpt/wikipedia/my-gpt2_text_document.idx .  --no-sign-request
aws s3 cp s3://neuron-s3/training_datasets/gpt/wikipedia/license.txt .  --no-sign-request

Examples

Scripts for training BERT and RoBERTa are provided in roBERTa/scripts folder. Scripts for multi-size (125M, 1.3B, 2.7B and 6.7B) GPTs can be found in NeMo-GPT/scripts/nlp_language_modeling folder.

Cite us

If you find our works helpful in your research, please consider citing the following paper:

@inproceedings{yu2024collage,
    title={Collage: Light-Weight Low-Precision Strategy for LLM Training},
    author={Yu, Tao and Gupta, Gaurav and Gopalswamy, Karthick and Mamidala, Amith and Zhou, Hao and Huynh, Jeffrey and Park, Youngsuk and Diamant, Ron and Deoras, Anoop and Huan, Luke},
    booktitle={Proceedings of the 41st International Conference on Machine Learning (ICML 2024)},
    year={2024},
    organization={PMLR}
}

License

NeMo-GPT is modified from NVIDIA NeMo, which is released under an Apache 2.0 license.

This code is being released solely for academic and scientific reproducibility purposes, in support of the methods and findings described in the associated publication. Pull requests are not being accepted in order to maintain the code exactly as it was used in the paper, but interested parties are encouraged to open an issue requesting open source community development.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
NeMo-GPT		NeMo-GPT
roBERTa		roBERTa
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Collage: Light-Weight Low-Precision Strategy for LLM Training

Introduction

Requirements

Datasets

Examples

Cite us

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

amazon-science/collage

Folders and files

Latest commit

History

Repository files navigation

Collage: Light-Weight Low-Precision Strategy for LLM Training

Introduction

Requirements

Datasets

Examples

Cite us

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages