Skip to content

Conversation

ebsmothers
Copy link
Contributor

@ebsmothers ebsmothers commented Dec 19, 2024

A bunch of our generation and eval configs don't match the format of our finetuning configs where the output_dir is at the top. This PR moves them to match the same format. It's a no-op for most configs touched here since output_dir is not used by generation or eval configs, but this way that's clear right at the top. The two configs that are not no-ops are early exit and quantization configs. This PR also helps ensure that output_dir != checkpoint_dir in all our configs. This is a constraint that we will now be enforcing so that we don't pollute HF's cached files with our finetuning artifacts

Script to find the files to update:

#!/bin/bash
find_yaml_files() {
  local dir="$1"
  while IFS= read -r -d '' file; do
    first_line=$(grep -v '^ *#' "$file" | sed '/^$/d' | head -n 1)
    if [[ ! $first_line = output_dir* ]]; then
      echo "$file"
    fi
  done < <(find "$dir" -type f -name '*.yaml' -print0)
}
find_yaml_files "/data/users/ebs/ebs-torchtune-alt/recipes"

Test plan:

Quantize recipe

tune run quantize --config quantization
...
INFO:torchtune.utils._logging:Model checkpoint of size 6.49 GiB saved to /tmp/torchtune/llama2_7B/quantized/pytorch_model-00001-of-00002-8da4w.pt

Early exit recipe

tune run --nnodes 1 --nproc_per_node 4 dev/early_exit_finetune_distributed --config recipes/dev/7B_full_early_exit.yaml max_steps_per_epoch=10
...
INFO:torchtune.utils._logging:Model checkpoint of size 9.29 GiB saved to /tmp/torchtune/llama2_7b/full_early_exit/epoch_0/ft-model-00001-of-00002.safetensors
INFO:torchtune.utils._logging:Model checkpoint of size 3.26 GiB saved to /tmp/torchtune/llama2_7b/full_early_exit/epoch_0/ft-model-00002-of-00002.safetensors
INFO:torchtune.utils._logging:Saving final epoch checkpoint.
INFO:torchtune.utils._logging:The full model checkpoint, including all weights and configurations, has been saved successfully.You can now use this checkpoint for further training or inference.

Generate V2 recipe

tune run dev/generate_v2 --config llama2/generation_v2
...
 Oh, how delightful! *adjusts glasses* The capital of France is... *drumroll* Paris! 🇫🇷 Yes, the City of Light, the City of Love, the City of Art... *sigh* You know, I've always wanted to visit Paris. It's on my AI bucket list. 😍 How about you? Have you been to Paris? 🤔

Copy link

pytorch-bot bot commented Dec 19, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2183

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0ae12d9 with merge base cdf5ea2 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 19, 2024
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.93%. Comparing base (9cfa288) to head (851030a).
Report is 10 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2183      +/-   ##
==========================================
- Coverage   67.75%   63.93%   -3.82%     
==========================================
  Files         334      339       +5     
  Lines       19281    20074     +793     
==========================================
- Hits        13064    12835     -229     
- Misses       6217     7239    +1022     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ebsmothers ebsmothers merged commit 75b0543 into meta-pytorch:main Dec 19, 2024
17 checks passed
felipemello1 pushed a commit that referenced this pull request Dec 20, 2024
mori360 pushed a commit to mori360/torchtune that referenced this pull request Dec 20, 2024
rahul-sarvam pushed a commit to sarvamai/torchtune that referenced this pull request Dec 23, 2024
rahul-sarvam pushed a commit to sarvamai/torchtune that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants