Optimize a Citrinet model fine tuning in multi GPU environment #3078
Unanswered
francescodaq
asked this question in
Q&A
Replies: 1 comment
-
Rank 0 GPU requiring slightly more memory is expected under multi node DDP since it has some overhead for communicating with the other ranks within the same hardware network connection. However 1.3 GB is a bit too much. It's usually around 100-200 mb for us. Make sure that other processes such as the graphics driver is not using the GPU. Or other processes are not using the rank 0 GPU. This has some info, but I don't think there's something like best practices https://pytorch-lightning.readthedocs.io/en/stable/advanced/multi_gpu.html |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello support,
we are attempting to move from a single GPU to multi GPU training environment.
The subject of training is the finetuning of a
Citrinet-1024
model for speech recognition.We executed a first fine tuning session on a single GPU machine (a single V100 with 16GB of memory), now we are moving to a new machine with 4 GPUs (4 T4 with 16GB of memory each).
The first training session featured a
batch_size
of 16 and alearning rate
of 0,025.The script we prepared for multi GPU fine tuning performs the following tasks:
learning rate
(due to the increment of GPU number)Trainer
objectIn order to benefit the increased hardware capacity we intended to keep the per GPU
batch size
to 16, thus obtaining an effectivebatch size
of 64, but we get an OOM error.We attempted decreasing the per GPU
batch size
, the greatest value avoiding OOM error is 12.Observing the output of nvidia-smi command while training is running, we see that GPU0 has more memory allocated than the other 3, so maybe it is bootlnecking the others, causing the OOM.
Is it correct?
Are we doing something wrong? Is there a way to distribute load in equally manner for all GPUs in order to maximize the benefits?
Do you have a tutorial/notebook or some article focusing about best practices for multi GPU training?
Thank you!
Francesco
Beta Was this translation helpful? Give feedback.
All reactions