-
Notifications
You must be signed in to change notification settings - Fork 680
Removing ao from pyproject.toml #1452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1452
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit d4bd32d with merge base 71be8ad ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Just sharing my 2 cents after chatting a bit with Evan offline. From what I understand the UX of the different options are: To install stable: # Option 1
pip install torch torchvision torchtune
# Option 2
pip install torch torchvision torchao torchtune To install nightly: # Option 1
pip install torch torchvision torchtune --pre ... # pulls in torchao stable, bad
pip install torchao --pre ... --force-reinstall
# Option 2
pip install torch torchvision torchao torchtune --pre ... This seems like a clear win for Option 2 at first sight, but in practice I suspect installing the stable torchtune will look more like an isolated Overall I think Option 1 provides a better experience when installing the stable packages, and a slightly worse experience for installing the nightlies. So I guess the choice depends on which one of these experiences you want to prioritize? |
From @NicolasHug:
This is a great TL;DR for the whole issue. To add my thoughts, I think this is dynamic. ATM, we implicitly prefer nightly builds b/c we rarely do stable releases and incorporate a lot of new features from AO and Core. Plus, the "LLM community" moves fast and expects us to, as well. However, I see a future in which we have more frequent releases where we would prefer the better UX for stable. |
Also @msaroufim just FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment, but I won't block
tests/__init__.py
Outdated
torchao not installed. | ||
If you are using stable PyTorch please install via pip install torchao. | ||
If you are using nightly PyTorch please install via | ||
pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cu121." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just point to the install page so we only have to maintain one source of truth for how to install torchao.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - even more annoying. Can you link specifically to the section in install about download torchao, torch, etc.?
You may need to add a tag there in RST.
TLDR: We have to choose between our ability to consistently provide stable, well-tested nightly packages and a clean install experience for all of our users. This PR reluctantly proposes to sacrifice slightly on the latter so that we can support the former in any capacity.
The goal
torchtune should be usable on either the latest stable release of PyTorch or the PyTorch nightlies. There are three main ways that we support people using torchtune:
This sounds simple, so what's the problem?
The problem
We don't only depend on PyTorch. We also depend on torchvision (less of a concern for this particular discussion) and torchao.
Why is this a problem?
torchao also does nightly releases and stable releases. Also, unlike torchvision, torchao's stable releases are not in sync with PyTorch stable releases.
torchao's goals are similar to ours: their stable releases will work with a particular PyTorch stable release, and their nightly releases will work with the corresponding PyTorch nightlies. But outside of that, there is no guaranteed support for either:
a) torchao nightly release using a previous PyTorch stable release, or
b) torchao stable release using a more recent PyTorch nightly release.
It may not immediately seem like it, but (b) is actually pretty important here: it's the reason that our nightly builds have been broken for 3 weeks, for one.
What can we do?
Option 1: --force-reinstall for nightly support
We could just say "most of our users will be happy on stable ao". That's a pretty reasonable statement and would lend itself to keeping ao in our pyproject.toml file. The problem is that forces them to be on stable PyTorch as well (or risk breakages; see our currently-broken nightly builds). We can consider reminding users who want to be on nightly PyTorch that they will also need to
--force-reinstall
ao after they install torchtune (since remember, torchao stable will be installed as part of our package in this world), but imo that is extremely confusing and unintuitive.Option 2: this PR
Remove torchao from our pyproject.toml file. This is in line with what we do with torch and torchvision already (which we do for the same reason) -- we will require users to first manually install those libraries either via the stable channel or the nightly channel.
The main drawback here is that while
pip install torch torchvision
is relatively idiomatic as a prereq (it's a subset of the command right on the main PyTorch install page),pip install torch torchvision torchao
is not. A lot of people will likely forget to do this and things will break.However, as of today this is the best way for us to support both nightly and stable PyTorch installs properly. The behavior in option 1 is unintuitive and has caused numerous breakages in our CI over the past few weeks that even people on our team did not know how to fix.
Option 3: ???
We have bandied about ideas of "dynamically populating index URLs in pyproject.toml" to delineate between stable and nightly installs. I am no packaging expert but I have not found a feasible way to do this yet. We could also consider moving off a pure-pyproject.toml approach for our packaging. But given the implications for uploading wheels discussed by @joecummings in #656 this also does not seem like a great option.
Like I said, I am no expert, so maybe there is a better way here. But in lieu of that, this PR proposes going with Option 2.
Implementation and testing
By going with option 2, we need to raise errors if users do not have ao installed. Rather than check in a bunch of different places, I've opted to do this more globally: raising errors in
torchtune/__init__.py
andtests/__init__.py
if ao import is unsuccessful. Open to a discussion on whether this is too blunt of an approach, but the alternative is to add try/excepts to every import (and there are quite a lot).Tested by creating a fresh env with (a) PyTorch nightlies and (b) PyTorch stable without ao to confirm I get the expected ImportError. After installing ao, things work as expected.