Why "No code changes are needed" with zero-offload? What is the most basic principle here? #4342
Unanswered
chansonzhang
asked this question in
Q&A
Replies: 1 comment
-
|
In the most basic form zero-offload offloads the entire optimizer state. These are essentially tensors regardless of the optimizer (e.g., Adam) and independent of model structure. For more details, please see the paper: https://arxiv.org/abs/2101.06840. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
e.g., How does zero-offload know the model structure and which part of params/memory to offload?
Beta Was this translation helpful? Give feedback.
All reactions