pass correct arg #2127

felipemello1 · 2024-12-06T22:36:48Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

In #2074, we allowed user to use resume_from_checkpoint=True without passing the recipe_state and the adapter paths. If they are none, we search for them in the latest epoch.

This PR is a hotfix that actually allows adapter paths to be None.

Test plan

No need to pass recipe state and adapter path

tune run lora_finetune_single_device --config llama3_1/8B_lora_single_device epochs=2 max_steps_per_epoch=5

tune run lora_finetune_single_device --config llama3_1/8B_lora_single_device epochs=2 max_steps_per_epoch=5 resume_from_checkpoint=True

pytorch-bot · 2024-12-06T22:36:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2127

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a7f88ad with merge base 424ffc3 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

* Llama 3.3 70B (meta-pytorch#2124) * Llama 3.3 readme updates (meta-pytorch#2125) * update configs (meta-pytorch#2107) Co-authored-by: Felipe Mello <felipemello@fb.com> * Reduce logging output for distributed KD (meta-pytorch#2120) * Support Early Exit Loss and/or Layer Dropout (meta-pytorch#1076) Co-authored-by: ebsmothers <ebs@meta.com> * Update checkpointing directory (meta-pytorch#2074) Co-authored-by: Felipe Mello <felipemello@fb.com> Co-authored-by: vancoyendall <vancoykendall@gmail.com> * pass correct arg (meta-pytorch#2127) Co-authored-by: Felipe Mello <felipemello@fb.com> * update configs (meta-pytorch#2128) Co-authored-by: Felipe Mello <felipemello@fb.com> * fix qat_lora_test (meta-pytorch#2131) Co-authored-by: Felipe Mello <felipemello@fb.com> --------- Co-authored-by: Philip Bontrager <pbontrager@gmail.com> Co-authored-by: ebsmothers <ebs@meta.com> Co-authored-by: Felipe Mello <fmellomascarenhas@gmail.com> Co-authored-by: Felipe Mello <felipemello@fb.com> Co-authored-by: Joe Cummings <jrcummings27@gmail.com> Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org> Co-authored-by: vancoyendall <vancoykendall@gmail.com>

* Llama 3.3 70B (meta-pytorch#2124) * Llama 3.3 readme updates (meta-pytorch#2125) * update configs (meta-pytorch#2107) Co-authored-by: Felipe Mello <felipemello@fb.com> * Reduce logging output for distributed KD (meta-pytorch#2120) * Support Early Exit Loss and/or Layer Dropout (meta-pytorch#1076) Co-authored-by: ebsmothers <ebs@meta.com> * Update checkpointing directory (meta-pytorch#2074) Co-authored-by: Felipe Mello <felipemello@fb.com> Co-authored-by: vancoyendall <vancoykendall@gmail.com> * pass correct arg (meta-pytorch#2127) Co-authored-by: Felipe Mello <felipemello@fb.com> * update configs (meta-pytorch#2128) Co-authored-by: Felipe Mello <felipemello@fb.com> * fix qat_lora_test (meta-pytorch#2131) Co-authored-by: Felipe Mello <felipemello@fb.com> * guard ckpt imports (meta-pytorch#2133) Co-authored-by: Felipe Mello <felipemello@fb.com> * [bug fix] add parents=True (meta-pytorch#2136) Co-authored-by: Felipe Mello <felipemello@fb.com> * [bug fix] re-add model (meta-pytorch#2135) Co-authored-by: Felipe Mello <felipemello@fb.com> * Update save sizes into GiB (meta-pytorch#2143) * [bug fix] remove config download when source is kaggle (meta-pytorch#2144) Co-authored-by: Felipe Mello <felipemello@fb.com> * [fix] remove "with_suffix" (meta-pytorch#2146) Co-authored-by: Felipe Mello <felipemello@fb.com> * DoRA fixes (meta-pytorch#2139) Co-authored-by: Mircea Mironenco <5738815+mirceamironenco@users.noreply.github.com> * [Fix] Llama 3.2 Vision decoder_trainable flag fixed (meta-pytorch#2150) * Small readme, config updates (meta-pytorch#2157) * Using `FormattedCheckpointFiles` in configs (meta-pytorch#2147) * Move ``get_world_size_and_rank`` to utils (meta-pytorch#2155) * Faster intermediate checkpoints with DCP async save in TorchTune (meta-pytorch#2006) Co-authored-by: Saurabh Mishra <msaurabh@fb.com> * torchdata integration - multi-dataset and streaming support (meta-pytorch#1929) * Allow higher version of lm-eval (meta-pytorch#2165) * Using `FormattedCheckpointFiles` in configs... round 2 (meta-pytorch#2167) * [EZ] Fix set_torch_num_threads in multi-node. (meta-pytorch#2164) --------- Co-authored-by: Philip Bontrager <pbontrager@gmail.com> Co-authored-by: ebsmothers <ebs@meta.com> Co-authored-by: Felipe Mello <fmellomascarenhas@gmail.com> Co-authored-by: Felipe Mello <felipemello@fb.com> Co-authored-by: Joe Cummings <jrcummings27@gmail.com> Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org> Co-authored-by: vancoyendall <vancoykendall@gmail.com> Co-authored-by: Mircea Mironenco <5738815+mirceamironenco@users.noreply.github.com> Co-authored-by: salman <salman.mohammadi@outlook.com> Co-authored-by: Saurabh Mishra <msaurabh@meta.com> Co-authored-by: Saurabh Mishra <msaurabh@fb.com> Co-authored-by: Andrew Ho <andrew.kenneth.ho@gmail.com> Co-authored-by: Eugen Hotaj <eugen_hotaj_91@hotmail.com>

Co-authored-by: Felipe Mello <felipemello@fb.com>

pass correct arg

a7f88ad

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 6, 2024

felipemello1 requested a review from ebsmothers December 6, 2024 22:37

ebsmothers approved these changes Dec 6, 2024

View reviewed changes

felipemello1 merged commit fef2c80 into meta-pytorch:main Dec 6, 2024
17 checks passed

felipemello1 deleted the ckpt_hotfix branch December 6, 2024 23:29

rahul-sarvam pushed a commit to sarvamai/torchtune that referenced this pull request Dec 23, 2024

pass correct arg (meta-pytorch#2127)

03c9adf

Co-authored-by: Felipe Mello <felipemello@fb.com>

rahul-sarvam pushed a commit to sarvamai/torchtune that referenced this pull request Dec 23, 2024

pass correct arg (meta-pytorch#2127)

b05b4d6

Co-authored-by: Felipe Mello <felipemello@fb.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pass correct arg #2127

pass correct arg #2127

Uh oh!

felipemello1 commented Dec 6, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 6, 2024 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pass correct arg #2127

pass correct arg #2127

Uh oh!

Conversation

felipemello1 commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Test plan

Uh oh!

pytorch-bot bot commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2127

✅ No Failures

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

felipemello1 commented Dec 6, 2024 •

edited

Loading

pytorch-bot bot commented Dec 6, 2024 •

edited

Loading