Gemini的zero方案可以初始化出gpt3级别的模型么? #2479
Unanswered
yhcc
asked this question in
Community | Q&A
Replies: 3 comments 4 replies
-
shard init初始化原理:ColoInitContext一个参数一个参数初始化,先在所有进程上分配出global tensor,然后再切分成N份,每个进程只保留1/N数据。所以全部初始化完毕,每个进程只用1/N内存。 |
Beta Was this translation helpful? Give feedback.
3 replies
-
Hi @yhcc @taishiciR For shard init, you can refer to here |
Beta Was this translation helpful? Give feedback.
1 reply
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
这里https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/gpt/gemini/train_gpt_demo.py 提供了一个基于Gemini的Zero方案,我尝试改变这个方案来直接支持gpt3级别的模型,但会导致oom问题。我看了下GeminiDDP的源码,似乎里面没有关于如何切分参数到不同卡的部分(不知道我是不是理解错了)。现在想通过Gemini这套方案训练一个GPT3级别的模型有什么推荐的方案吗?
Beta Was this translation helpful? Give feedback.
All reactions