How long does it take to train a sequene_parallel bert? #1365
Unanswered
tianboh
asked this question in
Community | Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am following the document to train in the docker environment.
I use the default config as
I am using 4 V100 GPUs to train this model. However, I only have done ~2400 iterations after an hour of training. There are 1000000 iterations in total, so it seems that it takes ~17 days to finish training. Is this normal? Or do I need to reduce training iterations?
Beta Was this translation helpful? Give feedback.
All reactions