[ET-VK] Refine paritioner to account for storage type and memory layout #6635

SS-JIA · 2024-11-04T17:30:15Z

Stack from ghstack (oldest at bottom):

Context

There are a variety of ways that tensors can be represented in Vulkan. The two main descriptors for how a tensor is laid out in memory is:

Storage Type (buffer or texture)
Memory Layout (which dim is packed along a texel, which dim has a stride of 1, etc.)

Due to the differences between buffers and textures, and the differences between different memory layouts, an implementation for an operator may only support a specific set of (storage type, memory layout) combinations.

Furthermore, if an operator implementation supports multiple (storage type, memory layout) combinations, there may be a "preferred" setting which results in optimal performance.

These changes lay the foundation for the implementation of a memory metadata tagging graph transform, which will make sure that all tensors participating in an operator call is has a valid/optimal (storage type, memory layout) setting, and insert transition operators to transfer input tensors to the correct memory settings when necessary.

An additional change that is required arises from the fact that in Vulkan, there is a limit on texture and buffer sizes. Therefore, the partitioner needs to account for the storage types and memory layouts supported by the operator implementation, and check if all tensors participating in a computation can be represented with some storage type, memory layout combination supported by the implementation.

Changes

Improvements to the operator registry:

Introduce utility functions to check the optimal and enabled storage types and memory layouts for an operator

Improvements to the Partitioner:

Account for the storage types and memory layouts supported by an operator when deciding if a node should be partitioned
Improved logic for fusable ops (i.e. the permute/transpose before a mm which can be fused into linear) to check if the final target op is supported in Vulkan, and only partition those nodes if so. Otherwise, don't partition it so that it can be fused by another backend.

Differential Revision: D65428843

## Context There are a variety of ways that tensors can be represented in Vulkan. The two main descriptors for how a tensor is laid out in memory is: 1. Storage Type (buffer or texture) 2. Memory Layout (which dim is packed along a texel, which dim has a stride of 1, etc.) Due to the differences between buffers and textures, and the differences between different memory layouts, an implementation for an operator may only support a specific set of (storage type, memory layout) combinations. Furthermore, if an operator implementation supports multiple (storage type, memory layout) combinations, there may be a "preferred" setting which results in optimal performance. These changes lay the foundation for the implementation of a memory metadata tagging graph transform, which will make sure that all tensors participating in an operator call is has a valid/optimal (storage type, memory layout) setting, and insert transition operators to transfer input tensors to the correct memory settings when necessary. An additional change that is required arises from the fact that in Vulkan, there is a limit on texture and buffer sizes. Therefore, the partitioner needs to account for the storage types and memory layouts supported by the operator implementation, and check if all tensors participating in a computation can be represented with some storage type, memory layout combination supported by the implementation. ## Changes Improvements to the operator registry: * Introduce utility functions to check the optimal and enabled storage types and memory layouts for an operator Improvements to the Partitioner: * Account for the storage types and memory layouts supported by an operator when deciding if a node should be partitioned * Improved logic for fusable ops (i.e. the permute/transpose before a mm which can be fused into linear) to check if the final target op is supported in Vulkan, and only partition those nodes if so. Otherwise, don't partition it so that it can be fused by another backend. Differential Revision: [D65428843](https://our.internmc.facebook.com/intern/diff/D65428843/) [ghstack-poisoned]

pytorch-bot · 2024-11-04T17:30:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6635

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit a61cab1 with merge base cd565b5 ():

NEW FAILURES - The following jobs have failed:

pull / test-binary-size-linux-gcc / linux-job (gh)
RuntimeError: Command docker exec -t c8e7216489161c2b943f1df6b762a0c8af41dd876e030c8d517fea78f8fe339b /exec failed with exit code 1
pull / unittest-arm / linux-job (gh)
RuntimeError: Command docker exec -t 8044f377f0118811fdf157c91c3a9112f66fb68a0d0c495c1a0aae7bbb953827 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-11-04T17:30:29Z

This pull request was exported from Phabricator. Differential Revision: D65428843

…memory layout" ## Context There are a variety of ways that tensors can be represented in Vulkan. The two main descriptors for how a tensor is laid out in memory is: 1. Storage Type (buffer or texture) 2. Memory Layout (which dim is packed along a texel, which dim has a stride of 1, etc.) Due to the differences between buffers and textures, and the differences between different memory layouts, an implementation for an operator may only support a specific set of (storage type, memory layout) combinations. Furthermore, if an operator implementation supports multiple (storage type, memory layout) combinations, there may be a "preferred" setting which results in optimal performance. These changes lay the foundation for the implementation of a memory metadata tagging graph transform, which will make sure that all tensors participating in an operator call is has a valid/optimal (storage type, memory layout) setting, and insert transition operators to transfer input tensors to the correct memory settings when necessary. An additional change that is required arises from the fact that in Vulkan, there is a limit on texture and buffer sizes. Therefore, the partitioner needs to account for the storage types and memory layouts supported by the operator implementation, and check if all tensors participating in a computation can be represented with some storage type, memory layout combination supported by the implementation. ## Changes Improvements to the operator registry: * Introduce utility functions to check the optimal and enabled storage types and memory layouts for an operator Improvements to the Partitioner: * Account for the storage types and memory layouts supported by an operator when deciding if a node should be partitioned * Improved logic for fusable ops (i.e. the permute/transpose before a mm which can be fused into linear) to check if the final target op is supported in Vulkan, and only partition those nodes if so. Otherwise, don't partition it so that it can be fused by another backend. Differential Revision: [D65428843](https://our.internmc.facebook.com/intern/diff/D65428843/) [ghstack-poisoned]

facebook-github-bot · 2024-11-04T21:11:57Z

This pull request was exported from Phabricator. Differential Revision: D65428843

…memory layout" ## Context There are a variety of ways that tensors can be represented in Vulkan. The two main descriptors for how a tensor is laid out in memory is: 1. Storage Type (buffer or texture) 2. Memory Layout (which dim is packed along a texel, which dim has a stride of 1, etc.) Due to the differences between buffers and textures, and the differences between different memory layouts, an implementation for an operator may only support a specific set of (storage type, memory layout) combinations. Furthermore, if an operator implementation supports multiple (storage type, memory layout) combinations, there may be a "preferred" setting which results in optimal performance. These changes lay the foundation for the implementation of a memory metadata tagging graph transform, which will make sure that all tensors participating in an operator call is has a valid/optimal (storage type, memory layout) setting, and insert transition operators to transfer input tensors to the correct memory settings when necessary. An additional change that is required arises from the fact that in Vulkan, there is a limit on texture and buffer sizes. Therefore, the partitioner needs to account for the storage types and memory layouts supported by the operator implementation, and check if all tensors participating in a computation can be represented with some storage type, memory layout combination supported by the implementation. ## Changes Improvements to the operator registry: * Introduce utility functions to check the optimal and enabled storage types and memory layouts for an operator Improvements to the Partitioner: * Account for the storage types and memory layouts supported by an operator when deciding if a node should be partitioned * Improved logic for fusable ops (i.e. the permute/transpose before a mm which can be fused into linear) to check if the final target op is supported in Vulkan, and only partition those nodes if so. Otherwise, don't partition it so that it can be fused by another backend. Differential Revision: [D65428843](https://our.internmc.facebook.com/intern/diff/D65428843/) [ghstack-poisoned]

facebook-github-bot · 2024-11-04T22:34:00Z

This pull request was exported from Phabricator. Differential Revision: D65428843

…memory layout" ## Context There are a variety of ways that tensors can be represented in Vulkan. The two main descriptors for how a tensor is laid out in memory is: 1. Storage Type (buffer or texture) 2. Memory Layout (which dim is packed along a texel, which dim has a stride of 1, etc.) Due to the differences between buffers and textures, and the differences between different memory layouts, an implementation for an operator may only support a specific set of (storage type, memory layout) combinations. Furthermore, if an operator implementation supports multiple (storage type, memory layout) combinations, there may be a "preferred" setting which results in optimal performance. These changes lay the foundation for the implementation of a memory metadata tagging graph transform, which will make sure that all tensors participating in an operator call is has a valid/optimal (storage type, memory layout) setting, and insert transition operators to transfer input tensors to the correct memory settings when necessary. An additional change that is required arises from the fact that in Vulkan, there is a limit on texture and buffer sizes. Therefore, the partitioner needs to account for the storage types and memory layouts supported by the operator implementation, and check if all tensors participating in a computation can be represented with some storage type, memory layout combination supported by the implementation. ## Changes Improvements to the operator registry: * Introduce utility functions to check the optimal and enabled storage types and memory layouts for an operator Improvements to the Partitioner: * Account for the storage types and memory layouts supported by an operator when deciding if a node should be partitioned * Improved logic for fusable ops (i.e. the permute/transpose before a mm which can be fused into linear) to check if the final target op is supported in Vulkan, and only partition those nodes if so. Otherwise, don't partition it so that it can be fused by another backend. Differential Revision: [D65428843](https://our.internmc.facebook.com/intern/diff/D65428843/) [ghstack-poisoned]

facebook-github-bot · 2024-11-05T15:34:55Z

This pull request was exported from Phabricator. Differential Revision: D65428843

…ut (#6668) Pull Request resolved: #6635 ## Context There are a variety of ways that tensors can be represented in Vulkan. The two main descriptors for how a tensor is laid out in memory is: 1. Storage Type (buffer or texture) 2. Memory Layout (which dim is packed along a texel, which dim has a stride of 1, etc.) Due to the differences between buffers and textures, and the differences between different memory layouts, an implementation for an operator may only support a specific set of (storage type, memory layout) combinations. Furthermore, if an operator implementation supports multiple (storage type, memory layout) combinations, there may be a "preferred" setting which results in optimal performance. These changes lay the foundation for the implementation of a memory metadata tagging graph transform, which will make sure that all tensors participating in an operator call is has a valid/optimal (storage type, memory layout) setting, and insert transition operators to transfer input tensors to the correct memory settings when necessary. An additional change that is required arises from the fact that in Vulkan, there is a limit on texture and buffer sizes. Therefore, the partitioner needs to account for the storage types and memory layouts supported by the operator implementation, and check if all tensors participating in a computation can be represented with some storage type, memory layout combination supported by the implementation. ## Changes Improvements to the operator registry: * Introduce utility functions to check the optimal and enabled storage types and memory layouts for an operator Improvements to the Partitioner: * Account for the storage types and memory layouts supported by an operator when deciding if a node should be partitioned * Improved logic for fusable ops (i.e. the permute/transpose before a mm which can be fused into linear) to check if the final target op is supported in Vulkan, and only partition those nodes if so. Otherwise, don't partition it so that it can be fused by another backend. ghstack-source-id: 251883705 @exported-using-ghexport Differential Revision: [D65428843](https://our.internmc.facebook.com/intern/diff/D65428843/) Co-authored-by: Stephen Jia <ssjia@meta.com>

* [ET-VK] Refine paritioner to account for storage type and memory layout Pull Request resolved: #6635 ## Context There are a variety of ways that tensors can be represented in Vulkan. The two main descriptors for how a tensor is laid out in memory is: 1. Storage Type (buffer or texture) 2. Memory Layout (which dim is packed along a texel, which dim has a stride of 1, etc.) Due to the differences between buffers and textures, and the differences between different memory layouts, an implementation for an operator may only support a specific set of (storage type, memory layout) combinations. Furthermore, if an operator implementation supports multiple (storage type, memory layout) combinations, there may be a "preferred" setting which results in optimal performance. These changes lay the foundation for the implementation of a memory metadata tagging graph transform, which will make sure that all tensors participating in an operator call is has a valid/optimal (storage type, memory layout) setting, and insert transition operators to transfer input tensors to the correct memory settings when necessary. An additional change that is required arises from the fact that in Vulkan, there is a limit on texture and buffer sizes. Therefore, the partitioner needs to account for the storage types and memory layouts supported by the operator implementation, and check if all tensors participating in a computation can be represented with some storage type, memory layout combination supported by the implementation. ## Changes Improvements to the operator registry: * Introduce utility functions to check the optimal and enabled storage types and memory layouts for an operator Improvements to the Partitioner: * Account for the storage types and memory layouts supported by an operator when deciding if a node should be partitioned * Improved logic for fusable ops (i.e. the permute/transpose before a mm which can be fused into linear) to check if the final target op is supported in Vulkan, and only partition those nodes if so. Otherwise, don't partition it so that it can be fused by another backend. ghstack-source-id: 251883705 @exported-using-ghexport Differential Revision: [D65428843](https://our.internmc.facebook.com/intern/diff/D65428843/) * [ET-VK] Introduce memory metadata tagging pass Pull Request resolved: #6636 ## Context As title; implements the memory metadata tagging graph transform described in the dependent diff. See the comments for more details. ghstack-source-id: 251884020 @exported-using-ghexport Differential Revision: [D65428842](https://our.internmc.facebook.com/intern/diff/D65428842/) --------- Co-authored-by: Stephen Jia <ssjia@meta.com>

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 4, 2024

facebook-github-bot added the fb-exported label Nov 4, 2024

SS-JIA mentioned this pull request Nov 4, 2024

[ET-VK] Introduce memory metadata tagging pass #6636

Merged

junpi3 approved these changes Nov 5, 2024

View reviewed changes

facebook-github-bot merged commit af124d1 into gh/SS-JIA/136/base Nov 5, 2024
37 of 41 checks passed

facebook-github-bot deleted the gh/SS-JIA/136/head branch November 5, 2024 20:16

facebook-github-bot temporarily deployed to cherry-pick-bot November 5, 2024 20:16 — with GitHub Actions Inactive

pytorchbot mentioned this pull request Nov 5, 2024

[ET-VK] Refine paritioner to account for storage type and memory layout #6668

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ET-VK] Refine paritioner to account for storage type and memory layout #6635

[ET-VK] Refine paritioner to account for storage type and memory layout #6635

Uh oh!

SS-JIA commented Nov 4, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 4, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Nov 4, 2024

Uh oh!

facebook-github-bot commented Nov 4, 2024

Uh oh!

facebook-github-bot commented Nov 4, 2024

Uh oh!

facebook-github-bot commented Nov 5, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[ET-VK] Refine paritioner to account for storage type and memory layout #6635

[ET-VK] Refine paritioner to account for storage type and memory layout #6635

Uh oh!

Conversation

SS-JIA commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

Uh oh!

pytorch-bot bot commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6635

❌ 2 New Failures

Uh oh!

facebook-github-bot commented Nov 4, 2024

Uh oh!

facebook-github-bot commented Nov 4, 2024

Uh oh!

facebook-github-bot commented Nov 4, 2024

Uh oh!

facebook-github-bot commented Nov 5, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SS-JIA commented Nov 4, 2024 •

edited

Loading

pytorch-bot bot commented Nov 4, 2024 •

edited

Loading