Skip to content

Conversation

insop
Copy link
Contributor

@insop insop commented Nov 30, 2024

Context

What is the purpose of this PR? Is it to

  • add a new feature
  • fix a bug
  • update tests and/or documentation
  • other (please add here)

Please link to any issues this PR addresses.

Changelog

What are the changes made in this PR?

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

  • run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
  • add unit tests for any new functionality
  • update docstrings for any new or updated methods or classes
  • run unit tests via pytest tests
  • run recipe tests via pytest tests -m integration_test
  • manually run any new or modified recipes with sufficient proof of correctness
  • include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

  • I did not change any public API
  • I have added an example to docs or docstrings

Copy link

pytorch-bot bot commented Nov 30, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2095

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 76724e0 with merge base 32e265d (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 30, 2024
@insop
Copy link
Contributor Author

insop commented Nov 30, 2024

@ebsmothers , PTAL.
Thank you.

Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @insop, thanks for the PR. This change I am not sure about, mainly because it is already possible to support these datasets straight from the command line.

(Note: I know that this kinda contradicts the fact that we have alpaca_cleaned_dataset as a partial right above, but tbh I am not crazy about this either.)

Our builders should be general enough that any of these can be directly plugged in from config or command line without the need for a separate builder. E.g. for the CodeAlpaca-20k dataset just change tune run ... to tune run ... dataset.source=sahil2801/CodeAlpaca-20k.

@insop
Copy link
Contributor Author

insop commented Dec 1, 2024

Hi @ebsmothers

Thank you for your review and the suggestion.
I think it makes sense and dataset.source=sahil2801/CodeAlpaca-20k should be a better way.

I am closing this PR.

@insop insop closed this Dec 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants