Support gradient accumulation #229
Closed
btbujiangjun
started this conversation in
Ideas
Replies: 1 comment
-
that's something you can definitely do on a case-by-case basis, but we likely wont add it in the core library we also have plans for gradient checkpointing and a few other levers we are exploring to reduce memory pressure. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Due to limited memory size, We have to set a small batch-size, gradient accumulation maybe a rigid demand
Beta Was this translation helpful? Give feedback.
All reactions