Skip to content

Fix Warpspec Matmul to be compatible with OmniFm Shapes #298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

njriasan
Copy link
Contributor

Summary:
Fixes a couple tutorial assumptions, most notably:

  1. It only worked with fp8 and fp16. Now it works with all dtypes for OmniFm.
  2. The shapes were not compatible due to layout mismatches. Since every shape will have a layout mismatch this adds an explicit tranpose to allow benchmarking a "best case" although this may not be accurate.
  3. Some shapes will never be compatible with TMA as the strides are not divisble by 16. I added an explicit check in the code to simplify this issue, but I will be skipping these.

Differential Revision: D77950060

Summary:
Fixes a couple tutorial assumptions, most notably:

1. It only worked with fp8 and fp16. Now it works with all dtypes for OmniFm.
2. The shapes were not compatible due to layout mismatches. Since every shape will have a layout mismatch this adds an explicit tranpose to allow benchmarking a "best case" although this may not be accurate.
3. Some shapes will never be compatible with TMA as the strides are not divisble by 16. I added an explicit check in the code to simplify this issue, but I will be skipping these.

Differential Revision: D77950060
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77950060

facebook-github-bot pushed a commit that referenced this pull request Jul 22, 2025
Summary:

Fixes a couple tutorial assumptions, most notably:

1. It only worked with fp8 and fp16. Now it works with all dtypes for OmniFm.
2. The shapes were not compatible due to layout mismatches. Since every shape will have a layout mismatch this adds an explicit tranpose to allow benchmarking a "best case" although this may not be accurate.
3. Some shapes will never be compatible with TMA as the strides are not divisble by 16. I added an explicit check in the code to simplify this issue, but I will be skipping these.

Differential Revision: D77950060
facebook-github-bot pushed a commit that referenced this pull request Jul 22, 2025
Summary:

Fixes a couple tutorial assumptions, most notably:

1. It only worked with fp8 and fp16. Now it works with all dtypes for OmniFm.
2. The shapes were not compatible due to layout mismatches. Since every shape will have a layout mismatch this adds an explicit tranpose to allow benchmarking a "best case" although this may not be accurate.
3. Some shapes will never be compatible with TMA as the strides are not divisble by 16. I added an explicit check in the code to simplify this issue, but I will be skipping these.

Differential Revision: D77950060
njriasan added a commit that referenced this pull request Jul 22, 2025
Summary:

Fixes a couple tutorial assumptions, most notably:

1. It only worked with fp8 and fp16. Now it works with all dtypes for OmniFm.
2. The shapes were not compatible due to layout mismatches. Since every shape will have a layout mismatch this adds an explicit tranpose to allow benchmarking a "best case" although this may not be accurate.
3. Some shapes will never be compatible with TMA as the strides are not divisble by 16. I added an explicit check in the code to simplify this issue, but I will be skipping these.

Reviewed By: PaulZhang12

Differential Revision: D77950060
njriasan added a commit that referenced this pull request Jul 22, 2025
Summary:

Fixes a couple tutorial assumptions, most notably:

1. It only worked with fp8 and fp16. Now it works with all dtypes for OmniFm.
2. The shapes were not compatible due to layout mismatches. Since every shape will have a layout mismatch this adds an explicit tranpose to allow benchmarking a "best case" although this may not be accurate.
3. Some shapes will never be compatible with TMA as the strides are not divisble by 16. I added an explicit check in the code to simplify this issue, but I will be skipping these.

Reviewed By: PaulZhang12

Differential Revision: D77950060
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants