Distributed Data Parallel communication hook #667

davidtweedle · 2024-03-01T17:24:46Z

davidtweedle
Mar 1, 2024
Collaborator

Hello,
I am working on a simple idea for a submission to this contest. My idea requires a communication hook to be registered for the distributed data parallel model from pytorch. Essentially, I want to calculate the gradient, then perform some calculation on the gradient separately on each GPU, then all_reduce the results. I do not think that this violates the spirit of the rules, but please let me know if you agree. Thank you for your time.
-David Tweedle

priyakasimbeg · 2024-03-02T02:02:48Z

priyakasimbeg
Mar 2, 2024
Maintainer

Hi David,
It sounds like you're asking whether you can perform manipulations to the gradients per data shard. Is that correct?
If that is the question, I think that is within the spirit of the rules.

Also, have you checked out our submission API in https://github.com/mlcommons/algorithmic-efficiency/blob/main/submissions/template/submission.py and example implementations (https://github.com/mlcommons/algorithmic-efficiency/blob/main/reference_algorithms/paper_baselines/adamw/pytorch/submission.py#L93). The idea is that submitters are free to implement each of the submission APIs as they wish. Our workload loss functions return 'unreduced' loss values so I believe you should be able to compute and perform calculations on the gradients per shard.

@runame can you confirm?

2 replies

runame Mar 2, 2024
Collaborator

Agreed, this should definitely be within the spirit of the rules.

davidtweedle Mar 5, 2024
Collaborator Author

Yes, I want to perform calculations on each shard. I am able to see how I can do so after looking closer at the examples. Thank you for your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Distributed Data Parallel communication hook #667

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Distributed Data Parallel communication hook #667

Uh oh!

Uh oh!

davidtweedle Mar 1, 2024 Collaborator

Replies: 1 comment · 2 replies

Uh oh!

priyakasimbeg Mar 2, 2024 Maintainer

Uh oh!

runame Mar 2, 2024 Collaborator

Uh oh!

davidtweedle Mar 5, 2024 Collaborator Author

davidtweedle
Mar 1, 2024
Collaborator

Replies: 1 comment 2 replies

priyakasimbeg
Mar 2, 2024
Maintainer

runame Mar 2, 2024
Collaborator

davidtweedle Mar 5, 2024
Collaborator Author