Distributed Data Parallel communication hook #667
Replies: 1 comment 2 replies
-
Hi David, Also, have you checked out our submission API in https://github.com/mlcommons/algorithmic-efficiency/blob/main/submissions/template/submission.py and example implementations (https://github.com/mlcommons/algorithmic-efficiency/blob/main/reference_algorithms/paper_baselines/adamw/pytorch/submission.py#L93). The idea is that submitters are free to implement each of the submission APIs as they wish. Our workload loss functions return 'unreduced' loss values so I believe you should be able to compute and perform calculations on the gradients per shard. @runame can you confirm? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I am working on a simple idea for a submission to this contest. My idea requires a communication hook to be registered for the distributed data parallel model from pytorch. Essentially, I want to calculate the gradient, then perform some calculation on the gradient separately on each GPU, then all_reduce the results. I do not think that this violates the spirit of the rules, but please let me know if you agree. Thank you for your time.
-David Tweedle
Beta Was this translation helpful? Give feedback.
All reactions