Example Code and instructions to reproduce the Single GPU optimization demo on GPT2 shown in the repo main page #5259

LCChen · 2024-01-11T17:21:58Z

LCChen
Jan 11, 2024

Dear experts,

I have a two RTX3090 cards locally, and is excited about the data shown in the picture bellow (single GPU could train 18B model). I would like to try it locally myself on the same demo and help me understand the mechanism of the Colossal AI better. Can I get your helps to find the example code and instructions to run the 18B GPT2 model on one GPUs (3090) ?

Thank u a lot for the help.

LCChen · 2024-01-11T17:28:24Z

LCChen
Jan 11, 2024
Author

Some update : I have been able to run 1.2B model (gpt_xl) in examples/language/gpt/gemini directory using two GPUs. But I never be able to run gpt2-10b model using the same example code.

Here is the configuration : ( I have minimize the batch to 1 to save the memory)

root@a370836de98a:/workspace/ColossalAI/examples/language/gpt/gemini# bash run_gemini.sh

export DISTPLAN=CAI_Gemini
DISTPLAN=CAI_Gemini
export GPUNUM=2
GPUNUM=2
export BATCH_SIZE=1
BATCH_SIZE=1
export MODEL_TYPE=gpt2_xl
MODEL_TYPE=gpt2_xl
export TRAIN_STEP=1000
TRAIN_STEP=1000
mkdir -p gemini_logs
torchrun --standalone --nproc_per_node=2 ./train_gpt_demo.py --model_type=gpt2_xl --batch_size=1 --distplan=CAI_Gemini --train_step=1000
tee ./gemini_logs/gpt2_xl_CAI_Gemini_gpu_2_bs_1.log

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Example Code and instructions to reproduce the Single GPU optimization demo on GPT2 shown in the repo main page #5259

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Example Code and instructions to reproduce the Single GPU optimization demo on GPT2 shown in the repo main page #5259

Uh oh!

LCChen Jan 11, 2024

Replies: 1 comment

Uh oh!

Uh oh!

LCChen Jan 11, 2024 Author

LCChen
Jan 11, 2024

LCChen
Jan 11, 2024
Author