Add output dir to top of all configs #2183

ebsmothers · 2024-12-19T18:54:41Z

A bunch of our generation and eval configs don't match the format of our finetuning configs where the output_dir is at the top. This PR moves them to match the same format. It's a no-op for most configs touched here since output_dir is not used by generation or eval configs, but this way that's clear right at the top. The two configs that are not no-ops are early exit and quantization configs. This PR also helps ensure that output_dir != checkpoint_dir in all our configs. This is a constraint that we will now be enforcing so that we don't pollute HF's cached files with our finetuning artifacts

Script to find the files to update:

#!/bin/bash
find_yaml_files() {
  local dir="$1"
  while IFS= read -r -d '' file; do
    first_line=$(grep -v '^ *#' "$file" | sed '/^$/d' | head -n 1)
    if [[ ! $first_line = output_dir* ]]; then
      echo "$file"
    fi
  done < <(find "$dir" -type f -name '*.yaml' -print0)
}
find_yaml_files "/data/users/ebs/ebs-torchtune-alt/recipes"

Test plan:

Quantize recipe

tune run quantize --config quantization
...
INFO:torchtune.utils._logging:Model checkpoint of size 6.49 GiB saved to /tmp/torchtune/llama2_7B/quantized/pytorch_model-00001-of-00002-8da4w.pt

Early exit recipe

tune run --nnodes 1 --nproc_per_node 4 dev/early_exit_finetune_distributed --config recipes/dev/7B_full_early_exit.yaml max_steps_per_epoch=10
...
INFO:torchtune.utils._logging:Model checkpoint of size 9.29 GiB saved to /tmp/torchtune/llama2_7b/full_early_exit/epoch_0/ft-model-00001-of-00002.safetensors
INFO:torchtune.utils._logging:Model checkpoint of size 3.26 GiB saved to /tmp/torchtune/llama2_7b/full_early_exit/epoch_0/ft-model-00002-of-00002.safetensors
INFO:torchtune.utils._logging:Saving final epoch checkpoint.
INFO:torchtune.utils._logging:The full model checkpoint, including all weights and configurations, has been saved successfully.You can now use this checkpoint for further training or inference.

Generate V2 recipe

tune run dev/generate_v2 --config llama2/generation_v2
...
 Oh, how delightful! *adjusts glasses* The capital of France is... *drumroll* Paris! 🇫🇷 Yes, the City of Light, the City of Love, the City of Art... *sigh* You know, I've always wanted to visit Paris. It's on my AI bucket list. 😍 How about you? Have you been to Paris? 🤔

pytorch-bot · 2024-12-19T18:54:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2183

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0ae12d9 with merge base cdf5ea2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

codecov-commenter · 2024-12-19T19:01:14Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.93%. Comparing base (9cfa288) to head (851030a).
Report is 10 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2183      +/-   ##
==========================================
- Coverage   67.75%   63.93%   -3.82%     
==========================================
  Files         334      339       +5     
  Lines       19281    20074     +793     
==========================================
- Hits        13064    12835     -229     
- Misses       6217     7239    +1022

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Add output dir to top of all configs

851030a

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 19, 2024

ebsmothers requested review from felipemello1, joecummings and pbontrager December 19, 2024 18:55

early exit config fix

0ae12d9

felipemello1 approved these changes Dec 19, 2024

View reviewed changes

ebsmothers merged commit 75b0543 into meta-pytorch:main Dec 19, 2024
17 checks passed

felipemello1 pushed a commit that referenced this pull request Dec 20, 2024

Add output dir to top of all configs (#2183)

332b1bf

mori360 pushed a commit to mori360/torchtune that referenced this pull request Dec 20, 2024

Add output dir to top of all configs (meta-pytorch#2183)

b73d552

rahul-sarvam pushed a commit to sarvamai/torchtune that referenced this pull request Dec 23, 2024

Add output dir to top of all configs (meta-pytorch#2183)

37bf22e

rahul-sarvam pushed a commit to sarvamai/torchtune that referenced this pull request Dec 23, 2024

Add output dir to top of all configs (meta-pytorch#2183)

76225cc

ebsmothers mentioned this pull request Jan 29, 2025

[ez] Add output_dir field to a couple configs #2309

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add output dir to top of all configs #2183

Add output dir to top of all configs #2183

Uh oh!

ebsmothers commented Dec 19, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 19, 2024 •

edited

Loading

Uh oh!

codecov-commenter commented Dec 19, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add output dir to top of all configs #2183

Add output dir to top of all configs #2183

Uh oh!

Conversation

ebsmothers commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan:

Quantize recipe

Early exit recipe

Generate V2 recipe

Uh oh!

pytorch-bot bot commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2183

✅ No Failures

Uh oh!

codecov-commenter commented Dec 19, 2024

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ebsmothers commented Dec 19, 2024 •

edited

Loading

pytorch-bot bot commented Dec 19, 2024 •

edited

Loading