Query Regarding ZeRO-1 in ColossalAI Not Sharding Optimizer State #4328
Unanswered
yhna940
asked this question in
Community | Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have been recently studying the ZeRO-1 strategy implemented by ColossalAI and have noticed something that seems quite unusual. As per my understanding, ColossalAI employs the LowLevelZeroOptimizer for its ZeRO-1 strategy.
According to the relevant literature, ZeRO-1 should shard the optimizer state, akin to what is done in fairscale's OSS or torch's Zero Redundancy. However, as I was perusing through the inner workings of the LowLevelZeroOptimizer, I couldn't find any section where the optimizer's state is sharded. I was able to confirm that it shards the gradients and parameters but not the optimizer state.
I am seeking verification regarding my understanding of this matter. Is it indeed the case that ColossalAI's ZeRO-1 doesn't shard the optimizer state or am I missing something? I would appreciate any insights or clarifications that you can provide.
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions