-
Notifications
You must be signed in to change notification settings - Fork 41
Open
Description
Hi, I'm using rnnt-loss and pytorch-lightning to train my model. But I found the 4D tensor which is used to calculate transducer will be accumulated in GPU, when I check the GPU memory in training step(before a batch starts), there are many 4D tensor(come from the previous batches) in the GPU memory. That will lead to CUDA out of memory finally. I don't know what went wrong.

gpu_tracker is used to check the GPU memory.

The loss in training step is from this.

This is the result of GPU memory usage in training step.
I try to use 'del', 'gc.collect()' and 'torch.cuda.empty_cache()' in everywhere, but they are all useless.
Metadata
Metadata
Assignees
Labels
No labels