-
Notifications
You must be signed in to change notification settings - Fork 689
[ET-VK] Used hashed layout instead of axis map UBO #6534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## Context #6358 showed that passing in the axis map of a tensor via a specialization constant allows shaders to utilize the axis map in indexing calculations with minimal impact to latency. This diff extends that idea, and introduces the concept of a hashed layout. The hashed layout is a 32 bit integer where: 1. Bits 28-31: `axis_map[0]` 2. Bits 24-27: `axis_map[1]` 3. Bits 20-23: `axis_map[2]` 4. Bits 16-19: `axis_map[3]` 5. Bits 12-15: `packed_dim` 6. Bits 0-11: unused Essentially, the integer is divided into chunks of 4 bits, and each chunk is used to represent a value from the `axis_map` + `packed_dim`. This way, the entire description of how the tensor is represented as a texture can be passed into a compute shader with a single specialization constant. Within the compute shader, the axis map and packed dim can be extracted like so: ``` ${layout_declare_spec_const(C, "int", "in_layout", "DEFAULT_LAYOUT")} const lowp ivec4 in_axis_map = unhash_axis_map(in_layout); const lowp int in_packed_dim = unhash_packed_dim(in_layout); ``` Note that `lowp` can be used because the expected values are limited by the dimensionality of the tensor, therefore we expect only small values. ## Changes 1. Introduce `hashed_layout` 2. Replace all uses of `axis_map_ubo` with `hashed_layout` 3. Remove `axis_map_ubo` from `vTensor. This also reduces the size of the class. Differential Revision: [D65085141](https://our.internmc.facebook.com/intern/diff/D65085141/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6534
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit a70b35e with merge base db38bcc ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
## Context #6358 showed that passing in the axis map of a tensor via a specialization constant allows shaders to utilize the axis map in indexing calculations with minimal impact to latency. This diff extends that idea, and introduces the concept of a hashed layout. The hashed layout is a 32 bit integer where: 1. Bits 28-31: `axis_map[0]` 2. Bits 24-27: `axis_map[1]` 3. Bits 20-23: `axis_map[2]` 4. Bits 16-19: `axis_map[3]` 5. Bits 12-15: `packed_dim` 6. Bits 0-11: unused Essentially, the integer is divided into chunks of 4 bits, and each chunk is used to represent a value from the `axis_map` + `packed_dim`. This way, the entire description of how the tensor is represented as a texture can be passed into a compute shader with a single specialization constant. Within the compute shader, the axis map and packed dim can be extracted like so: ``` ${layout_declare_spec_const(C, "int", "in_layout", "DEFAULT_LAYOUT")} const lowp ivec4 in_axis_map = unhash_axis_map(in_layout); const lowp int in_packed_dim = unhash_packed_dim(in_layout); ``` Note that `lowp` can be used because the expected values are limited by the dimensionality of the tensor, therefore we expect only small values. ## Changes 1. Introduce `hashed_layout` 2. Replace all uses of `axis_map_ubo` with `hashed_layout` 3. Remove `axis_map_ubo` from `vTensor. This also reduces the size of the class. Differential Revision: [D65085141](https://our.internmc.facebook.com/intern/diff/D65085141/) ghstack-source-id: 250503989 Pull Request resolved: #6534
This pull request was exported from Phabricator. Differential Revision: D65085141 |
## Context #6358 showed that passing in the axis map of a tensor via a specialization constant allows shaders to utilize the axis map in indexing calculations with minimal impact to latency. This diff extends that idea, and introduces the concept of a hashed layout. The hashed layout is a 32 bit integer where: 1. Bits 28-31: `axis_map[0]` 2. Bits 24-27: `axis_map[1]` 3. Bits 20-23: `axis_map[2]` 4. Bits 16-19: `axis_map[3]` 5. Bits 12-15: `packed_dim` 6. Bits 0-11: unused Essentially, the integer is divided into chunks of 4 bits, and each chunk is used to represent a value from the `axis_map` + `packed_dim`. This way, the entire description of how the tensor is represented as a texture can be passed into a compute shader with a single specialization constant. Within the compute shader, the axis map and packed dim can be extracted like so: ``` ${layout_declare_spec_const(C, "int", "in_layout", "DEFAULT_LAYOUT")} const lowp ivec4 in_axis_map = unhash_axis_map(in_layout); const lowp int in_packed_dim = unhash_packed_dim(in_layout); ``` Note that `lowp` can be used because the expected values are limited by the dimensionality of the tensor, therefore we expect only small values. ## Changes 1. Introduce `hashed_layout` 2. Replace all uses of `axis_map_ubo` with `hashed_layout` 3. Remove `axis_map_ubo` from `vTensor. This also reduces the size of the class. Differential Revision: [D65085141](https://our.internmc.facebook.com/intern/diff/D65085141/) [ghstack-poisoned]
Pull Request resolved: #6534 ## Context #6358 showed that passing in the axis map of a tensor via a specialization constant allows shaders to utilize the axis map in indexing calculations with minimal impact to latency. This diff extends that idea, and introduces the concept of a hashed layout. The hashed layout is a 32 bit integer where: 1. Bits 28-31: `axis_map[0]` 2. Bits 24-27: `axis_map[1]` 3. Bits 20-23: `axis_map[2]` 4. Bits 16-19: `axis_map[3]` 5. Bits 12-15: `packed_dim` 6. Bits 0-11: unused Essentially, the integer is divided into chunks of 4 bits, and each chunk is used to represent a value from the `axis_map` + `packed_dim`. This way, the entire description of how the tensor is represented as a texture can be passed into a compute shader with a single specialization constant. Within the compute shader, the axis map and packed dim can be extracted like so: ``` ${layout_declare_spec_const(C, "int", "in_layout", "DEFAULT_LAYOUT")} const lowp ivec4 in_axis_map = unhash_axis_map(in_layout); const lowp int in_packed_dim = unhash_packed_dim(in_layout); ``` Note that `lowp` can be used because the expected values are limited by the dimensionality of the tensor, therefore we expect only small values. ## Changes 1. Introduce `hashed_layout` 2. Replace all uses of `axis_map_ubo` with `hashed_layout` 3. Remove `axis_map_ubo` from `vTensor. This also reduces the size of the class. ghstack-source-id: 250525144 @exported-using-ghexport Differential Revision: [D65085141](https://our.internmc.facebook.com/intern/diff/D65085141/)
This pull request was exported from Phabricator. Differential Revision: D65085141 |
## Context #6358 showed that passing in the axis map of a tensor via a specialization constant allows shaders to utilize the axis map in indexing calculations with minimal impact to latency. This diff extends that idea, and introduces the concept of a hashed layout. The hashed layout is a 32 bit integer where: 1. Bits 28-31: `axis_map[0]` 2. Bits 24-27: `axis_map[1]` 3. Bits 20-23: `axis_map[2]` 4. Bits 16-19: `axis_map[3]` 5. Bits 12-15: `packed_dim` 6. Bits 0-11: unused Essentially, the integer is divided into chunks of 4 bits, and each chunk is used to represent a value from the `axis_map` + `packed_dim`. This way, the entire description of how the tensor is represented as a texture can be passed into a compute shader with a single specialization constant. Within the compute shader, the axis map and packed dim can be extracted like so: ``` ${layout_declare_spec_const(C, "int", "in_layout", "DEFAULT_LAYOUT")} const lowp ivec4 in_axis_map = unhash_axis_map(in_layout); const lowp int in_packed_dim = unhash_packed_dim(in_layout); ``` Note that `lowp` can be used because the expected values are limited by the dimensionality of the tensor, therefore we expect only small values. ## Changes 1. Introduce `hashed_layout` 2. Replace all uses of `axis_map_ubo` with `hashed_layout` 3. Remove `axis_map_ubo` from `vTensor. This also reduces the size of the class. Differential Revision: [D65085141](https://our.internmc.facebook.com/intern/diff/D65085141/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D65085141 |
Pull Request resolved: #6534 ## Context #6358 showed that passing in the axis map of a tensor via a specialization constant allows shaders to utilize the axis map in indexing calculations with minimal impact to latency. This diff extends that idea, and introduces the concept of a hashed layout. The hashed layout is a 32 bit integer where: 1. Bits 28-31: `axis_map[0]` 2. Bits 24-27: `axis_map[1]` 3. Bits 20-23: `axis_map[2]` 4. Bits 16-19: `axis_map[3]` 5. Bits 12-15: `packed_dim` 6. Bits 0-11: unused Essentially, the integer is divided into chunks of 4 bits, and each chunk is used to represent a value from the `axis_map` + `packed_dim`. This way, the entire description of how the tensor is represented as a texture can be passed into a compute shader with a single specialization constant. Within the compute shader, the axis map and packed dim can be extracted like so: ``` ${layout_declare_spec_const(C, "int", "in_layout", "DEFAULT_LAYOUT")} const lowp ivec4 in_axis_map = unhash_axis_map(in_layout); const lowp int in_packed_dim = unhash_packed_dim(in_layout); ``` Note that `lowp` can be used because the expected values are limited by the dimensionality of the tensor, therefore we expect only small values. ## Changes 1. Introduce `hashed_layout` 2. Replace all uses of `axis_map_ubo` with `hashed_layout` 3. Remove `axis_map_ubo` from `vTensor. This also reduces the size of the class. ghstack-source-id: 250928240 @exported-using-ghexport Differential Revision: [D65085141](https://our.internmc.facebook.com/intern/diff/D65085141/)
Pull Request resolved: #6534 ## Context #6358 showed that passing in the axis map of a tensor via a specialization constant allows shaders to utilize the axis map in indexing calculations with minimal impact to latency. This diff extends that idea, and introduces the concept of a hashed layout. The hashed layout is a 32 bit integer where: 1. Bits 28-31: `axis_map[0]` 2. Bits 24-27: `axis_map[1]` 3. Bits 20-23: `axis_map[2]` 4. Bits 16-19: `axis_map[3]` 5. Bits 12-15: `packed_dim` 6. Bits 0-11: unused Essentially, the integer is divided into chunks of 4 bits, and each chunk is used to represent a value from the `axis_map` + `packed_dim`. This way, the entire description of how the tensor is represented as a texture can be passed into a compute shader with a single specialization constant. Within the compute shader, the axis map and packed dim can be extracted like so: ``` ${layout_declare_spec_const(C, "int", "in_layout", "DEFAULT_LAYOUT")} const lowp ivec4 in_axis_map = unhash_axis_map(in_layout); const lowp int in_packed_dim = unhash_packed_dim(in_layout); ``` Note that `lowp` can be used because the expected values are limited by the dimensionality of the tensor, therefore we expect only small values. ## Changes 1. Introduce `hashed_layout` 2. Replace all uses of `axis_map_ubo` with `hashed_layout` 3. Remove `axis_map_ubo` from `vTensor. This also reduces the size of the class. ghstack-source-id: 250928240 @exported-using-ghexport Differential Revision: [D65085141](https://our.internmc.facebook.com/intern/diff/D65085141/) Co-authored-by: Stephen Jia <ssjia@meta.com>
Stack from ghstack (oldest at bottom):
Context
#6358 showed that passing in the axis map of a tensor via a specialization constant allows shaders to utilize the axis map in indexing calculations with minimal impact to latency.
This diff extends that idea, and introduces the concept of a hashed layout. The hashed layout is a 32 bit integer where:
axis_map[0]
axis_map[1]
axis_map[2]
axis_map[3]
packed_dim
Essentially, the integer is divided into chunks of 4 bits, and each chunk is used to represent a value from the
axis_map
+packed_dim
. This way, the entire description of how the tensor is represented as a texture can be passed into a compute shader with a single specialization constant.Within the compute shader, the axis map and packed dim can be extracted like so:
Note that
lowp
can be used because the expected values are limited by the dimensionality of the tensor, therefore we expect only small values.Changes
hashed_layout
axis_map_ubo
withhashed_layout
axis_map_ubo
from `vTensor. This also reduces the size of the class.Differential Revision: D65085141