Collage is a low-precision training strategy for large language models (LLMs). Collage makes use of multi-component floats to reduce the memory footprint during training, particularly for optimization, with purely low-precision (e.g. BFloat16) arithmetic without consorting to Float32.
It is simple to use Collage by replacing AdamW
with our AdamW_collage
optimizer and use different collage
options, i.e., light
or plus
(check our paper for more details). We provide Collage training for BERT & RoBERTa and also multi-size GPTs with the NeMo Megatron framework.
- python 3.8 or above
- transformers 4.31.0 or above
- pytorch 1.13.1 + CUDA 11.7 or above
We recommend using NeMo r1.22.0
with released container nemo:23.11
docker pull nvcr.io/nvidia/nemo:23.11.framework
Please follow AWS-Neuron-Tutorials-BERT to download the tokenized wikicorpus file for BERT and RoBERTa
mkdir -p ./examples_datasets/
pushd ./examples_datasets/
aws s3 cp s3://neuron-s3/training_datasets/bert_pretrain_wikicorpus_tokenized_hdf5/bert_pretrain_wikicorpus_tokenized_hdf5_seqlen128.tar . --no-sign-request
tar -xf bert_pretrain_wikicorpus_tokenized_hdf5_seqlen128.tar
rm bert_pretrain_wikicorpus_tokenized_hdf5_seqlen128.tar
aws s3 cp s3://neuron-s3/training_datasets/bert_pretrain_wikicorpus_tokenized_hdf5/bert_pretrain_wikicorpus_tokenized_hdf5_seqlen512.tar . --no-sign-request
tar -xf bert_pretrain_wikicorpus_tokenized_hdf5_seqlen512.tar
rm bert_pretrain_wikicorpus_tokenized_hdf5_seqlen512.tar
popd
Please follow AWS-Neuron-Examples-GPT to download the wikipedia dataset that is stored in s3
export DATA_DIR=./examples_datasets/gpt2
mkdir -p ${DATA_DIR} && cd ${DATA_DIR}
wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-vocab.json
wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-merges.txt
aws s3 cp s3://neuron-s3/training_datasets/gpt/wikipedia/my-gpt2_text_document.bin . --no-sign-request
aws s3 cp s3://neuron-s3/training_datasets/gpt/wikipedia/my-gpt2_text_document.idx . --no-sign-request
aws s3 cp s3://neuron-s3/training_datasets/gpt/wikipedia/license.txt . --no-sign-request
Scripts for training BERT and RoBERTa are provided in roBERTa/scripts
folder.
Scripts for multi-size (125M, 1.3B, 2.7B and 6.7B) GPTs can be found in NeMo-GPT/scripts/nlp_language_modeling
folder.
If you find our works helpful in your research, please consider citing the following paper:
@inproceedings{yu2024collage,
title={Collage: Light-Weight Low-Precision Strategy for LLM Training},
author={Yu, Tao and Gupta, Gaurav and Gopalswamy, Karthick and Mamidala, Amith and Zhou, Hao and Huynh, Jeffrey and Park, Youngsuk and Diamant, Ron and Deoras, Anoop and Huan, Luke},
booktitle={Proceedings of the 41st International Conference on Machine Learning (ICML 2024)},
year={2024},
organization={PMLR}
}
NeMo-GPT is modified from NVIDIA NeMo, which is released under an Apache 2.0 license
.
Modifications Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
This code is being released solely for academic and scientific reproducibility purposes, in support of the methods and findings described in the associated publication. Pull requests are not being accepted in order to maintain the code exactly as it was used in the paper, but interested parties are encouraged to open an issue requesting open source community development.