Skip to content

[CI] Add bazel TPU presubmit testing #29660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MichaelHudgins
Copy link
Collaborator

@MichaelHudgins MichaelHudgins commented Jun 23, 2025

Add a new TPU bazel presubmit using 3.13 free threading to increase coverage of our TPU presubmits. This will be introduced as non blocking and be made blocking once reliability is confirmed.

Additionally refactor use of resultstore and rbe cache in jobs that do not use RBE for testing or building. This should allow for use of resultstore and a cache that can allow for testing cache hits.

@MichaelHudgins MichaelHudgins added the CI Connection Halt - On Retry Flag every job that has its workflow file setup for halting to halt if the workflow is retried label Jun 23, 2025
@MichaelHudgins MichaelHudgins removed the CI Connection Halt - On Retry Flag every job that has its workflow file setup for halting to halt if the workflow is retried label Jun 25, 2025
@MichaelHudgins MichaelHudgins added CI Optional GPU Presubmit Label to flag PR to run additional GPU testing not in standard presubmits CI Connection Halt - On Retry Flag every job that has its workflow file setup for halting to halt if the workflow is retried and removed CI Connection Halt - On Retry Flag every job that has its workflow file setup for halting to halt if the workflow is retried labels Jul 2, 2025
@MichaelHudgins MichaelHudgins changed the title [CI] Add bazel TPU presubmit testing - WIP [CI] Add bazel TPU presubmit testing Jul 8, 2025
@MichaelHudgins MichaelHudgins added CI Connection Halt - On Retry Flag every job that has its workflow file setup for halting to halt if the workflow is retried and removed CI Optional GPU Presubmit Label to flag PR to run additional GPU testing not in standard presubmits labels Jul 8, 2025
@MichaelHudgins MichaelHudgins marked this pull request as ready for review July 9, 2025 13:26
@MichaelHudgins MichaelHudgins removed the CI Connection Halt - On Retry Flag every job that has its workflow file setup for halting to halt if the workflow is retried label Jul 9, 2025
@MichaelHudgins MichaelHudgins requested a review from hawkinsp July 9, 2025 13:26
@kanglant kanglant self-requested a review July 10, 2025 16:36
@google-ml-butler google-ml-butler bot added kokoro:force-run pull ready Ready for copybara import and testing labels Jul 10, 2025
halt-dispatch-input: ${{ inputs.halt-for-connection }}
- name: Install nightly libtpu
run: |
$JAXCI_PYTHON -m uv pip install --pre libtpu -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it doesn't affect hermetic Python, e.g. in this case bazel cache will have the LibTPU version defined in the lock file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC i was getting no libtpu found without it, I'll double check in the morning. I am unsure though if we would want a locked libtpu when it comes to presubmit. Much like XLA this might need the latest nightly or have drift in results. (Both nightly and continuous reference the libtpu nightly)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm testing if downloading the LibTPU wheel into the dist folder (the same one where we have Jax wheels) works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't installed any wheels at all, the tests passed - https://github.com/jax-ml/jax/actions/runs/16511467263/job/46694360132?pr=30492

@MichaelHudgins MichaelHudgins removed the request for review from yashk2810 July 25, 2025 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kokoro:force-run pull ready Ready for copybara import and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants