[ET-VK] Move rotary embedding custom op to be handled via graph pass instead of source transform #13465

pytorchbot · 2025-08-16T01:14:32Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #13451 by @SS-JIA
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/287/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/287/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/287/orig
@diff-train-skip-merge

…instead of source transform Pull Request resolved: #13451 ## Motivation Be able to test Vulkan lowering via optimum-executorch. ## Context Currently, ET-VK implements rotary embeddings via a custom op. This op is currently inserted into Transformer models by replacing Rotary Embedding modules with a custom module that executes the custom op via a source transform. The source transform approach makes it cumbersome to lower LLMs to Vulkan, since it requires the export logic to apply the source transform before calling `torch.export()`. This in turn makes it difficult to integrate Vulkan lowering into optimum-executorch, which tries to use a common export + lowering logic for all lowering paths. As an alternative, leverage `SubgraphMatcher` to detect fusable patterns and fuse the rotary embedding graph pattern into the custom op as part of the Vulkan delegate's graph passes. This removes the requirement to apply a custom source transform just for Vulkan. ## Changes * Introduce the `backends/vulkan/patterns` folder to store fusable graph patterns * Introduce a fusable graph pattern for rotary positional embeddings * Update partitioner logic to automatically include nodes that are part of a fusable graph pattern * Introduce a pass to fuse known patterns into custom ops / custom op sequence * Remove vulkan rotary embedding source transform ghstack-source-id: 303387281 Differential Revision: [D80293301](https://our.internmc.facebook.com/intern/diff/D80293301/)

pytorch-bot · 2025-08-16T01:14:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13465

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 14 Pending, 1 Unrelated Failure

As of commit fea6c7a with merge base 7fbca4d ():

NEW FAILURE - The following job has failed:

Build documentation / build (buck2) / Build doc (gh)
At least one of the pre-conditions you specified did not hold

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-binary-size-linux-gcc / linux-job (gh) (trunk failure)
/pytorch/executorch/kernels/portable/cpu/op_stack.cpp:129:26: error: comparison of integer expressions of different signedness: ‘size_t’ {aka ‘long unsigned int’} and ‘ssize_t’ {aka ‘long int’} [-Werror=sign-compare]

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-08-16T01:15:12Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…instead of source transform (pytorch#13465) ## Motivation Be able to test Vulkan lowering via optimum-executorch. ## Context Currently, ET-VK implements rotary embeddings via a custom op. This op is currently inserted into Transformer models by replacing Rotary Embedding modules with a custom module that executes the custom op via a source transform. The source transform approach makes it cumbersome to lower LLMs to Vulkan, since it requires the export logic to apply the source transform before calling `torch.export()`. This in turn makes it difficult to integrate Vulkan lowering into optimum-executorch, which tries to use a common export + lowering logic for all lowering paths. As an alternative, leverage `SubgraphMatcher` to detect fusable patterns and fuse the rotary embedding graph pattern into the custom op as part of the Vulkan delegate's graph passes. This removes the requirement to apply a custom source transform just for Vulkan. ## Changes * Introduce the `backends/vulkan/patterns` folder to store fusable graph patterns * Introduce a fusable graph pattern for rotary positional embeddings * Update partitioner logic to automatically include nodes that are part of a fusable graph pattern * Introduce a pass to fuse known patterns into custom ops / custom op sequence Differential Revision: [D80293301](https://our.internmc.facebook.com/intern/diff/D80293301/) Co-authored-by: ssjia <ssjia@devvm5117.ash0.facebook.com>

pytorchbot requested review from SS-JIA, jackzhxng and lucylq as code owners August 16, 2025 01:14

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 16, 2025

Merge branch 'main' into gh/SS-JIA/287/orig

fea6c7a

manuelcandales approved these changes Aug 16, 2025

View reviewed changes

manuelcandales merged commit 4438d31 into main Aug 16, 2025
102 of 104 checks passed

manuelcandales deleted the gh/SS-JIA/287/orig branch August 16, 2025 04:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ET-VK] Move rotary embedding custom op to be handled via graph pass instead of source transform #13465

[ET-VK] Move rotary embedding custom op to be handled via graph pass instead of source transform #13465

Uh oh!

pytorchbot commented Aug 16, 2025

Uh oh!

pytorch-bot bot commented Aug 16, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[ET-VK] Move rotary embedding custom op to be handled via graph pass instead of source transform #13465

[ET-VK] Move rotary embedding custom op to be handled via graph pass instead of source transform #13465

Uh oh!

Conversation

pytorchbot commented Aug 16, 2025

Uh oh!

pytorch-bot bot commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13465

❌ 1 New Failure, 14 Pending, 1 Unrelated Failure

Uh oh!

github-actions bot commented Aug 16, 2025

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Aug 16, 2025 •

edited

Loading

This PR needs a `release notes:` label