Skip to content

Conversation

ebsmothers
Copy link
Contributor

@ebsmothers ebsmothers commented Aug 29, 2024

TLDR: We have to choose between our ability to consistently provide stable, well-tested nightly packages and a clean install experience for all of our users. This PR reluctantly proposes to sacrifice slightly on the latter so that we can support the former in any capacity.

The goal

torchtune should be usable on either the latest stable release of PyTorch or the PyTorch nightlies. There are three main ways that we support people using torchtune:

  1. our stable releases. A given stable release of torchtune should be runnable on a particular stable release of PyTorch, as well as up to a particular nightly release (i.e. the PyTorch nightly whenever we released our package). Things should still work on the same torchtune stable release for more recent PyTorch versions, but we use some APIs without BC guarantees, so in practice this is definitely not guaranteed.
  2. our nightly releases. These should also be runnable on the latest PyTorch stable release, but in practice the point of installing our nightlies is to use integrations with latest PyTorch features which may only be available on PyTorch nightlies. So here we should recommend people to also install with PyTorch nightlies.
  3. installed from source. When running on main, we should be able to install either PyTorch stable release or PyTorch nightly and have torchtune run out of the box at all times.

This sounds simple, so what's the problem?

The problem

We don't only depend on PyTorch. We also depend on torchvision (less of a concern for this particular discussion) and torchao.

Why is this a problem?

torchao also does nightly releases and stable releases. Also, unlike torchvision, torchao's stable releases are not in sync with PyTorch stable releases.

torchao's goals are similar to ours: their stable releases will work with a particular PyTorch stable release, and their nightly releases will work with the corresponding PyTorch nightlies. But outside of that, there is no guaranteed support for either:

a) torchao nightly release using a previous PyTorch stable release, or
b) torchao stable release using a more recent PyTorch nightly release.

It may not immediately seem like it, but (b) is actually pretty important here: it's the reason that our nightly builds have been broken for 3 weeks, for one.

What can we do?

Option 1: --force-reinstall for nightly support

We could just say "most of our users will be happy on stable ao". That's a pretty reasonable statement and would lend itself to keeping ao in our pyproject.toml file. The problem is that forces them to be on stable PyTorch as well (or risk breakages; see our currently-broken nightly builds). We can consider reminding users who want to be on nightly PyTorch that they will also need to --force-reinstall ao after they install torchtune (since remember, torchao stable will be installed as part of our package in this world), but imo that is extremely confusing and unintuitive.

Option 2: this PR

Remove torchao from our pyproject.toml file. This is in line with what we do with torch and torchvision already (which we do for the same reason) -- we will require users to first manually install those libraries either via the stable channel or the nightly channel.

The main drawback here is that while pip install torch torchvision is relatively idiomatic as a prereq (it's a subset of the command right on the main PyTorch install page), pip install torch torchvision torchao is not. A lot of people will likely forget to do this and things will break.

However, as of today this is the best way for us to support both nightly and stable PyTorch installs properly. The behavior in option 1 is unintuitive and has caused numerous breakages in our CI over the past few weeks that even people on our team did not know how to fix.

Option 3: ???

We have bandied about ideas of "dynamically populating index URLs in pyproject.toml" to delineate between stable and nightly installs. I am no packaging expert but I have not found a feasible way to do this yet. We could also consider moving off a pure-pyproject.toml approach for our packaging. But given the implications for uploading wheels discussed by @joecummings in #656 this also does not seem like a great option.

Like I said, I am no expert, so maybe there is a better way here. But in lieu of that, this PR proposes going with Option 2.

Implementation and testing

By going with option 2, we need to raise errors if users do not have ao installed. Rather than check in a bunch of different places, I've opted to do this more globally: raising errors in torchtune/__init__.py and tests/__init__.py if ao import is unsuccessful. Open to a discussion on whether this is too blunt of an approach, but the alternative is to add try/excepts to every import (and there are quite a lot).

Tested by creating a fresh env with (a) PyTorch nightlies and (b) PyTorch stable without ao to confirm I get the expected ImportError. After installing ao, things work as expected.

Copy link

pytorch-bot bot commented Aug 29, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1452

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d4bd32d with merge base 71be8ad (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 29, 2024
@ebsmothers ebsmothers marked this pull request as ready for review September 2, 2024 04:26
@ebsmothers ebsmothers changed the title [wip] removing ao from pyproject.toml Removing ao from pyproject.toml Sep 2, 2024
@NicolasHug
Copy link
Contributor

Just sharing my 2 cents after chatting a bit with Evan offline. From what I understand the UX of the different options are:

To install stable:

# Option 1
pip install torch torchvision torchtune
# Option 2
pip install torch torchvision torchao torchtune

To install nightly:

# Option 1
pip install torch torchvision torchtune --pre ...  # pulls in torchao stable, bad
pip install torchao --pre ... --force-reinstall

# Option 2
pip install torch torchvision torchao torchtune --pre ...

This seems like a clear win for Option 2 at first sight, but in practice I suspect installing the stable torchtune will look more like an isolated pip install torchtune (where torch and torchvision have already been installed previously), and such a scenario would break with Option 2 because torchao would be missing.

Overall I think Option 1 provides a better experience when installing the stable packages, and a slightly worse experience for installing the nightlies. So I guess the choice depends on which one of these experiences you want to prioritize?

@joecummings
Copy link
Member

From @NicolasHug:

Overall I think Option 1 provides a better experience when installing the stable packages, and a slightly worse experience for installing the nightlies.

This is a great TL;DR for the whole issue. To add my thoughts, I think this is dynamic. ATM, we implicitly prefer nightly builds b/c we rarely do stable releases and incorporate a lot of new features from AO and Core. Plus, the "LLM community" moves fast and expects us to, as well.

However, I see a future in which we have more frequent releases where we would prefer the better UX for stable.

@ebsmothers
Copy link
Contributor Author

Also @msaroufim just FYI

Copy link
Member

@joecummings joecummings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment, but I won't block

torchao not installed.
If you are using stable PyTorch please install via pip install torchao.
If you are using nightly PyTorch please install via
pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cu121."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just point to the install page so we only have to maintain one source of truth for how to install torchao.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - even more annoying. Can you link specifically to the section in install about download torchao, torch, etc.?

You may need to add a tag there in RST.

@ebsmothers ebsmothers merged commit ca0a6de into meta-pytorch:main Sep 3, 2024
20 checks passed
@ebsmothers ebsmothers deleted the drop-ao branch September 3, 2024 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants