-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Hi, thanks for releasing this great work!
I am trying to train the model using multi-GPU (DDP), but I encountered the following error from PyTorch Lightning:
RuntimeError: It looks like your LightningModule has parameters that were not used in producing the loss returned by training_step.
If this is intentional, you must enable the detection of unused parameters in DDP, either by setting the string value
`strategy='ddp_find_unused_parameters_true'` or by setting the flag in the strategy with
`strategy=DDPStrategy(find_unused_parameters=True)`.
When I checked the logs with a small debugging hook, I found that the following parameters were reported as unused at step 0:
encoder.depth_predictor.cost_head.scratch.refinenet4.resConfUnit1.conv1.weight
encoder.depth_predictor.cost_head.scratch.refinenet4.resConfUnit1.conv1.bias
encoder.depth_predictor.cost_head.scratch.refinenet4.resConfUnit1.conv2.weight
encoder.depth_predictor.cost_head.scratch.refinenet4.resConfUnit1.conv2.bias
encoder.depth_predictor.cost_head.scratch.output_conv1.weight
encoder.depth_predictor.cost_head.scratch.output_conv1.bias
encoder.depth_predictor.cost_head.scratch.output_conv2.0.weight
encoder.depth_predictor.cost_head.scratch.output_conv2.0.bias
encoder.depth_predictor.cost_head.scratch.output_conv2.2.weight
encoder.depth_predictor.cost_head.scratch.output_conv2.2.bias
As far as I understand, the monocular depth encoder/decoder is meant to be frozen, and only the mono-multi feature adapter and integrated Gaussian prediction modules should be trainable. If that is the case, then it seems natural that some depth predictor parameters are not used in the loss.
To resolve this, I modified the training script so that the Trainer uses:
from pytorch_lightning.strategies import DDPStrategy
trainer = Trainer(
...,
strategy=DDPStrategy(find_unused_parameters=True),
...
)
This makes the training run without crashing. I understand that this has some communication overhead in DDP, but it should be fine if the unused parameters are intentional.
Is this the correct and intended way to run the code with multi-GPU?
Or should I explicitly freeze/remove those unused parameters instead of relying on find_unused_parameters=True?
Just want to confirm whether my modification is consistent with the authors’ intended training setup.
Thanks a lot!