Skip to content

GHA that adds Flash Attention Benchmarking on B200 #49

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 18, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions .github/workflows/flash_attention.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: Flash Attention Benchmark

# To remotely trigger a FA Benchmarking run, use the following:
# curl -L -X POST -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" -H "Authorization: Bearer $TOKEN" https://api.github.com/repos/pytorch/pytorch-integration-testing/dispatches -d '{"event_type": "benchmark_flash_attention"}'

on:
schedule:
- cron: "0 6 * * *" # Run every day at 6AM
push:
paths:
- .github/workflows/flash_attention.yml
repository_dispatch:
Copy link
Contributor

@huydhn huydhn Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is new to me. What does this repository_dispatch do? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows triggering the action by running —
$ curl -L -X POST -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" -H "Authorization: Bearer $TOKEN" https://api.github.com/repos/pytorch/pytorch-integration-testing/dispatches -d '{"event_type": "benchmark_flash_attention"}'

The intent is trigger this action when there is a commit in the FA repo, but the FA repo needs to be updated.

types: benchmark_flash_attention
workflow_dispatch:
jobs:
benchmark-flash-attn:
name: Flash Attention CuTe DSL Benchmark
runs-on: B200
container:
# https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch/
image: nvcr.io/nvidia/pytorch:25.06-py3
options: --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864
steps:
- uses: actions/checkout@v4
with:
repository: 'Dao-AILab/flash-attention'
path: 'fa4'
- name: Install CuTe DSL
run: |
set -x
echo "Installing nvidia-cutlass-dsl"
pip install nvidia-cutlass-dsl==4.1.0.dev0
- name: Buid and Run FlashAttention CuTe DSL
run: |
set -x
pushd fa4
python setup.py install

echo '<h1>B200 1000W</h1>' >> $GITHUB_STEP_SUMMARY
nvidia-smi
export PYTHONPATH=$(pwd)
python benchmarks/benchmark_attn.py >> $GITHUB_STEP_SUMMARY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are interested in building a dashboard out of it on https://hud.pytorch.org later, feel free to take a look at https://github.com/pytorch/pytorch/wiki/PyTorch-OSS-benchmark-infra#benchmark-results-format to see if we could save the attention benchmark output here in the same format. That's all it is needed to be done.


popd