如何进行zero优化+TP+PP #2181

Unanswered

yhcc asked this question in Community | Q&A

yhcc
Dec 22, 2022

hello, 根据#1839 这里的说法，似乎ZeroInitContext不再推荐使用了，应该修改为https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/gpt/train_gpt_demo.py 这种，然后根据https://colossalaiworkspace.slack.com/archives/C02NAJARJ9Y/p1665482115336439?thread_ts=1665444351.685429&cid=C02NAJARJ9Y 中提到的，ZeroDDP没有测试PP；但在https://github.com/hpcaitech/ColossalAI-Examples/blob/main/language/gpt/train_gpt.py 这里有ZeroInitContext和TP+PP的example。如果不推荐使用ZeroInitContext的话，有无什么案例可以参考更新的做法呀？

Replies: 1 comment 5 replies

ver217
Dec 23, 2022

目前在开发ZeRO2，后续会更新ZeRO2+PP的例子。不推荐使用ZeRO3+PP，效率比较低

5 replies

yhcc Dec 23, 2022
Author

ZeRO2这个有开发的时间表吗？大概多久可以ready呀？

ver217 Dec 23, 2022

Zero2单独使用是可以的，但是没验证ZeRO2+PP

binmakeswell Dec 23, 2022
Maintainer

@1SAA will update it soon.

yhcc Dec 23, 2022
Author

感谢回复。如何单独使用Zero2呢？我理解现在使用Zero主要两个内容，一个是将module传入到ZeroDDP这个module中，第二个是使用ZeroOptimizer。我理解zero1的是通过ZeroOptimizer实现的；zero2应该是需要依靠ZeroDDP来实现的？那么如果只需要zero2优化的话，应该如何设置ZeroDDP让他不要shard parameter，而只shared gradient呢？以及是否有可能只使用Zero1优化捏？谢谢～

binmakeswell Jan 5, 2023
Maintainer

您好
使用zero2可以参考https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/gpt/train_gpt_demo.py#L287
结合PP可以参考https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/gpt/train_gpt_pp_demo.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment