Migrating training scripts to torchrun #1933

lkosh · 2025-05-08T13:02:56Z

No description provided.

codecov · 2025-05-08T13:27:45Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.74%. Comparing base (db6d0db) to head (06984f4).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1933      +/-   ##
==========================================
- Coverage   96.80%   96.74%   -0.06%     
==========================================
  Files         172      172              
  Lines        8442     8442              
==========================================
- Hits         8172     8167       -5     
- Misses        270      275       +5

Flag	Coverage Δ
unittests	`96.74% <ø> (-0.06%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

felixdittrich92

Thanks 👍

Mh Normally in this case I think we can merge both scripts into one (DDP & the normal train script) ? - because the logic is the same - anyway what we should test is that the logging does still work with torchrun (W&B for example)

if args.backend:
   torch.cuda.set_device(rank)
   dist.init_process_group(backend=args.backend)

references/detection/train_pytorch_ddp.py

references/recognition/README.md

references/recognition/train_pytorch_ddp.py

references/recognition/README.md

references/recognition/train_pytorch.py

lkosh added 5 commits May 6, 2025 11:15

recognition training script

4b64472

detection training script

c8b0a62

training scripts

c046f14

docs

60ea0af

docs

cbc4de9

felixdittrich92 requested changes May 8, 2025

View reviewed changes

references/detection/train_pytorch_ddp.py Outdated Show resolved Hide resolved

references/recognition/README.md Outdated Show resolved Hide resolved

references/recognition/train_pytorch_ddp.py Outdated Show resolved Hide resolved

bugfix

c1501c9

felixdittrich92 added this to the 0.12.0 milestone May 9, 2025

felixdittrich92 self-assigned this May 9, 2025

unified training script

4bd2281

felixdittrich92 requested changes May 10, 2025

View reviewed changes

lkosh added 5 commits May 12, 2025 09:27

pr fixes

01f055f

pr fixes

807bbe6

pr fixes

6ad8083

detection training script

302ba4b

cpu fix

06984f4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrating training scripts to torchrun #1933

Migrating training scripts to torchrun #1933

lkosh commented May 8, 2025

codecov bot commented May 8, 2025 •

edited

Loading

felixdittrich92 left a comment •

edited

Loading

Migrating training scripts to torchrun #1933

Are you sure you want to change the base?

Migrating training scripts to torchrun #1933

Conversation

lkosh commented May 8, 2025

codecov bot commented May 8, 2025 • edited Loading

Codecov Report

felixdittrich92 left a comment • edited Loading

Choose a reason for hiding this comment

codecov bot commented May 8, 2025 •

edited

Loading

felixdittrich92 left a comment •

edited

Loading