1.基于Galore论文中代码修改实现
pip install galore-torch
pip install -r exp_requirements.txt注意请在python3.8环境下运行(实验使用版本为3.8.20)
Tokenize下载
from transformers import AutoTokenizer, AutoModel
# 加载预训练的分词器和模型
tokenizer = AutoTokenizer.from_pretrained("google/t5-base")
model = AutoModel.from_pretrained("google/t5-base")
# 保存到本地
tokenizer.save_pretrained("./t5-base-tokenizer")
model.save_pretrained("./t5-base-model")dataset下载
GIT_LFS_SKIP_SMUDGE=1 git clone https://hf-mirror.com/datasets/allenai/c4
cd c4
git lfs pull --include "en/*"sh run.sh