-
Notifications
You must be signed in to change notification settings - Fork 689
[ET-VK] Enable IntxWeightOnlyConfig #13452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## Motivation Be able to test Vulkan lowering via optimum-executorch. ## Context Very similar to the below PR, Int4 weight only quantization is currently enabled in Vulkan via a custom source transform quantizer that replaces linear layers with a custom linear layer that calls a custom weight only quantized linear op. This diff aims to make it so that no Vulkan specific source transforms need to be applied by adding a fusion pattern for weight only quantized linear. ## Changes * Introduce a fusable graph pattern for weight only quantized linear * Add fusion logic for weight only quantized linear in the fuse patterns pass * Add `4w` qmode to the export llama script Differential Revision: [D80293302](https://our.internmc.facebook.com/intern/diff/D80293302/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13452
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit fb29875 with merge base 1b4968f ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
## Motivation Be able to test Vulkan lowering via optimum-executorch. ## Context Very similar to the below PR, Int4 weight only quantization is currently enabled in Vulkan via a custom source transform quantizer that replaces linear layers with a custom linear layer that calls a custom weight only quantized linear op. This diff aims to make it so that no Vulkan specific source transforms need to be applied by adding a fusion pattern for weight only quantized linear. ## Changes * Introduce a fusable graph pattern for weight only quantized linear * Add fusion logic for weight only quantized linear in the fuse patterns pass * Add `4w` qmode to the export llama script Differential Revision: [D80293302](https://our.internmc.facebook.com/intern/diff/D80293302/) ghstack-source-id: 303222323 Pull Request resolved: #13452
This pull request was exported from Phabricator. Differential Revision: D80293302 |
This PR needs a
|
## Motivation Be able to test Vulkan lowering via optimum-executorch. ## Context Very similar to the below PR, Int4 weight only quantization is currently enabled in Vulkan via a custom source transform quantizer that replaces linear layers with a custom linear layer that calls a custom weight only quantized linear op. This diff aims to make it so that no Vulkan specific source transforms need to be applied by adding a fusion pattern for weight only quantized linear. ## Changes * Introduce a fusable graph pattern for weight only quantized linear * Add fusion logic for weight only quantized linear in the fuse patterns pass * Add `4w` qmode to the export llama script Differential Revision: [D80293302](https://our.internmc.facebook.com/intern/diff/D80293302/) [ghstack-poisoned]
Pull Request resolved: #13452 ## Motivation Be able to test Vulkan lowering via optimum-executorch. ## Context Very similar to the below PR, Int4 weight only quantization is currently enabled in Vulkan via a custom source transform quantizer that replaces linear layers with a custom linear layer that calls a custom weight only quantized linear op. This diff aims to make it so that no Vulkan specific source transforms need to be applied by adding a fusion pattern for weight only quantized linear. ## Changes * Introduce a fusable graph pattern for weight only quantized linear * Add fusion logic for weight only quantized linear in the fuse patterns pass * Add `4w` qmode to the export llama script Differential Revision: [D80293302](https://our.internmc.facebook.com/intern/diff/D80293302/) ghstack-source-id: 303224475
This pull request was exported from Phabricator. Differential Revision: D80293302 |
## Motivation Be able to test Vulkan lowering via optimum-executorch. ## Context Very similar to the below PR, Int4 weight only quantization is currently enabled in Vulkan via a custom source transform quantizer that replaces linear layers with a custom linear layer that calls a custom weight only quantized linear op. This diff aims to make it so that no Vulkan specific source transforms need to be applied by adding a fusion pattern for weight only quantized linear. ## Changes * Introduce a fusable graph pattern for weight only quantized linear * Add fusion logic for weight only quantized linear in the fuse patterns pass * Add `4w` qmode to the export llama script Differential Revision: [D80293302](https://our.internmc.facebook.com/intern/diff/D80293302/) [ghstack-poisoned]
Pull Request resolved: #13452 ## Motivation Be able to test Vulkan lowering via optimum-executorch. ## Context Very similar to the below PR, Int4 weight only quantization is currently enabled in Vulkan via a custom source transform quantizer that replaces linear layers with a custom linear layer that calls a custom weight only quantized linear op. This diff aims to make it so that no Vulkan specific source transforms need to be applied by adding a fusion pattern for weight only quantized linear. ## Changes * Introduce a fusable graph pattern for weight only quantized linear * Add fusion logic for weight only quantized linear in the fuse patterns pass * Add `4w` qmode to the export llama script ghstack-source-id: 303277500 Differential Revision: [D80293302](https://our.internmc.facebook.com/intern/diff/D80293302/)
This pull request was exported from Phabricator. Differential Revision: D80293302 |
## Motivation Be able to test Vulkan lowering via optimum-executorch. ## Context Very similar to the below PR, Int4 weight only quantization is currently enabled in Vulkan via a custom source transform quantizer that replaces linear layers with a custom linear layer that calls a custom weight only quantized linear op. This diff aims to make it so that no Vulkan specific source transforms need to be applied by adding a fusion pattern for weight only quantized linear. ## Changes * Introduce a fusable graph pattern for weight only quantized linear * Add fusion logic for weight only quantized linear in the fuse patterns pass * Add `4w` qmode to the export llama script Differential Revision: [D80293302](https://our.internmc.facebook.com/intern/diff/D80293302/) [ghstack-poisoned]
Pull Request resolved: #13452 Be able to test Vulkan lowering via optimum-executorch. Very similar to the below PR, Int4 weight only quantization is currently enabled in Vulkan via a custom source transform quantizer that replaces linear layers with a custom linear layer that calls a custom weight only quantized linear op. This diff aims to make it so that no Vulkan specific source transforms need to be applied by adding a fusion pattern for weight only quantized linear. * Introduce a fusable graph pattern for weight only quantized linear * Add fusion logic for weight only quantized linear in the fuse patterns pass * Add `4w` qmode to the export llama script ghstack-source-id: 6903b2d Differential Revision: [D80293302](https://our.internmc.facebook.com/intern/diff/D80293302/)
@SS-JIA has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
## Motivation Be able to test Vulkan lowering via optimum-executorch. ## Context Very similar to the below PR, Int4 weight only quantization is currently enabled in Vulkan via a custom source transform quantizer that replaces linear layers with a custom linear layer that calls a custom weight only quantized linear op. This diff aims to make it so that no Vulkan specific source transforms need to be applied by adding a fusion pattern for weight only quantized linear. ## Changes * Introduce a fusable graph pattern for weight only quantized linear * Add fusion logic for weight only quantized linear in the fuse patterns pass * Add `4w` qmode to the export llama script Differential Revision: [D80293302](https://our.internmc.facebook.com/intern/diff/D80293302/) [ghstack-poisoned]
Pull Request resolved: #13452 ## Motivation Be able to test Vulkan lowering via optimum-executorch. ## Context Very similar to the below PR, Int4 weight only quantization is currently enabled in Vulkan via a custom source transform quantizer that replaces linear layers with a custom linear layer that calls a custom weight only quantized linear op. This diff aims to make it so that no Vulkan specific source transforms need to be applied by adding a fusion pattern for weight only quantized linear. ## Changes * Introduce a fusable graph pattern for weight only quantized linear * Add fusion logic for weight only quantized linear in the fuse patterns pass * Add `4w` qmode to the export llama script ghstack-source-id: 303313326 Differential Revision: [D80293302](https://our.internmc.facebook.com/intern/diff/D80293302/)
This pull request was exported from Phabricator. Differential Revision: D80293302 |
## Motivation Be able to test Vulkan lowering via optimum-executorch. ## Context Very similar to the below PR, Int4 weight only quantization is currently enabled in Vulkan via a custom source transform quantizer that replaces linear layers with a custom linear layer that calls a custom weight only quantized linear op. This diff aims to make it so that no Vulkan specific source transforms need to be applied by adding a fusion pattern for weight only quantized linear. ## Changes * Introduce a fusable graph pattern for weight only quantized linear * Add fusion logic for weight only quantized linear in the fuse patterns pass * Add `4w` qmode to the export llama script Differential Revision: [D80293302](https://our.internmc.facebook.com/intern/diff/D80293302/) [ghstack-poisoned]
Pull Request resolved: #13452 ## Motivation Be able to test Vulkan lowering via optimum-executorch. ## Context Very similar to the below PR, Int4 weight only quantization is currently enabled in Vulkan via a custom source transform quantizer that replaces linear layers with a custom linear layer that calls a custom weight only quantized linear op. This diff aims to make it so that no Vulkan specific source transforms need to be applied by adding a fusion pattern for weight only quantized linear. ## Changes * Introduce a fusable graph pattern for weight only quantized linear * Add fusion logic for weight only quantized linear in the fuse patterns pass * Add `4w` qmode to the export llama script ghstack-source-id: 303328940 Differential Revision: [D80293302](https://our.internmc.facebook.com/intern/diff/D80293302/)
This pull request was exported from Phabricator. Differential Revision: D80293302 |
## Motivation Be able to test Vulkan lowering via optimum-executorch. ## Context Very similar to the below PR, Int4 weight only quantization is currently enabled in Vulkan via a custom source transform quantizer that replaces linear layers with a custom linear layer that calls a custom weight only quantized linear op. This diff aims to make it so that no Vulkan specific source transforms need to be applied by adding a fusion pattern for weight only quantized linear. ## Changes * Introduce a fusable graph pattern for weight only quantized linear * Add fusion logic for weight only quantized linear in the fuse patterns pass * Add `4w` qmode to the export llama script Differential Revision: [D80293302](https://our.internmc.facebook.com/intern/diff/D80293302/) [ghstack-poisoned]
Pull Request resolved: #13452 ## Motivation Be able to test Vulkan lowering via optimum-executorch. ## Context Very similar to the below PR, Int4 weight only quantization is currently enabled in Vulkan via a custom source transform quantizer that replaces linear layers with a custom linear layer that calls a custom weight only quantized linear op. This diff aims to make it so that no Vulkan specific source transforms need to be applied by adding a fusion pattern for weight only quantized linear. ## Changes * Introduce a fusable graph pattern for weight only quantized linear * Add fusion logic for weight only quantized linear in the fuse patterns pass * Add `4w` qmode to the export llama script ghstack-source-id: 303387280 Differential Revision: [D80293302](https://our.internmc.facebook.com/intern/diff/D80293302/)
This pull request was exported from Phabricator. Differential Revision: D80293302 |
6bef39c
into
gh/SS-JIA/288/base
Pull Request resolved: #13452 ## Motivation Be able to test Vulkan lowering via optimum-executorch. ## Context Very similar to the below PR, Int4 weight only quantization is currently enabled in Vulkan via a custom source transform quantizer that replaces linear layers with a custom linear layer that calls a custom weight only quantized linear op. This diff aims to make it so that no Vulkan specific source transforms need to be applied by adding a fusion pattern for weight only quantized linear. ## Changes * Introduce a fusable graph pattern for weight only quantized linear * Add fusion logic for weight only quantized linear in the fuse patterns pass * Add `4w` qmode to the export llama script ghstack-source-id: 303387280 Differential Revision: [D80293302](https://our.internmc.facebook.com/intern/diff/D80293302/)
Stack from ghstack (oldest at bottom):
Motivation
Be able to test Vulkan lowering via optimum-executorch.
Context
Very similar to the below PR, Int4 weight only quantization is currently enabled in Vulkan via a custom source transform quantizer that replaces linear layers with a custom linear layer that calls a custom weight only quantized linear op.
This diff aims to make it so that no Vulkan specific source transforms need to be applied by adding a fusion pattern for weight only quantized linear.
Changes
4w
qmode to the export llama scriptDifferential Revision: D80293302