Skip to content

Migrating training scripts to torchrun #1933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Conversation

lkosh
Copy link
Contributor

@lkosh lkosh commented May 8, 2025

No description provided.

Copy link

codecov bot commented May 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.74%. Comparing base (db6d0db) to head (06984f4).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1933      +/-   ##
==========================================
- Coverage   96.80%   96.74%   -0.06%     
==========================================
  Files         172      172              
  Lines        8442     8442              
==========================================
- Hits         8172     8167       -5     
- Misses        270      275       +5     
Flag Coverage Δ
unittests 96.74% <ø> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@felixdittrich92 felixdittrich92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 👍

Mh Normally in this case I think we can merge both scripts into one (DDP & the normal train script) ? - because the logic is the same - anyway what we should test is that the logging does still work with torchrun (W&B for example)

if args.backend:
   torch.cuda.set_device(rank)
   dist.init_process_group(backend=args.backend)

@felixdittrich92 felixdittrich92 added this to the 0.12.0 milestone May 9, 2025
@felixdittrich92 felixdittrich92 added topic: documentation Improvements or additions to documentation type: enhancement Improvement ext: references Related to references folder framework: pytorch Related to PyTorch backend topic: text detection Related to the task of text detection topic: text recognition Related to the task of text recognition labels May 9, 2025
@felixdittrich92 felixdittrich92 self-assigned this May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ext: references Related to references folder framework: pytorch Related to PyTorch backend topic: documentation Improvements or additions to documentation topic: text detection Related to the task of text detection topic: text recognition Related to the task of text recognition type: enhancement Improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants