-
Notifications
You must be signed in to change notification settings - Fork 296
Add Float8Tensor #2463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Float8Tensor #2463
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2463
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Unrelated FailureAs of commit 032425d with merge base 378e179 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: Splits out the float8 rowwise quantized path (both act and weight) of AQT to Float8RowwiseTensor Next: could potentially incorporate the per tensor activation path there as well Next: we can split the per tensor weight path to another Tensor as well, so we can deprecate AQT path for float8 Test Plan: python test/dtypes/test_affine_quantized_float.py python test/quantization/quantize_/test_float8_rowwise_tensor.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2463, branch: jerryzh168/stack/9
da79207
to
5cae4d0
Compare
5cae4d0
to
33ca58e
Compare
33ca58e
to
897ec7e
Compare
7897dcf
to
99a1bb1
Compare
99a1bb1
to
7e9f224
Compare
7e9f224
to
442bd6c
Compare
) | ||
new_weight = new_layer.weight.tensor_impl.float8_data.to(torch.float32) | ||
else: | ||
if mode == "static": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does that practically mean, is static_quant no broken after this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static quant is not migrated yet, it won't break
4bb1c40
to
1190e05
Compare
from torchao.utils import _is_fbgemm_genai_gpu_available, is_sm_at_least_90 | ||
|
||
_MODEL_NAMES = [ | ||
"torchao-testing/opt-125m-float8dq-row-fbgemm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think model name here should specify the relevant versions
also, IMO this should be a toy model with a single layer with matching done on the layer output, to make it 100x easier to debug when things do go wrong. It's fine to also have a real model and match tokens, but I think it's more important to have a toy model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single linear for debugability makes sense, although I'm not sure how we can get a toy model with a single linear in huggingface transformers actually, I can add version but can revisit on getting a single layer
1190e05
to
ca37916
Compare
ca37916
to
76ba25d
Compare
76ba25d
to
183e631
Compare
183e631
to
598d7ba
Compare
66f4137
to
3f8c237
Compare
Summary: Added Float8Tensor that works for: * fbgemm: per row activation + per row weight calling torch.ops.fbgemm.f8f8bf16_rowwise kernela * aten: per row/tensor activation + per row/tensor weight calling torch._scaled_mm, or weight only quantization (fallback path) Reusing Float8DynamicActivationFloat8WeightConfig for the above, and use kernel to control which kernel users will use Test Plan: python test/dtypes/test_affine_quantized_float.py python test/quantization/quantize_/test_float8_rowwise_tensor.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2463, branch: jerryzh168/stack/9
3f8c237
to
032425d
Compare
Stacked PRs:
Add Float8Tensor
Summary:
Added Float8Tensor that works for:
Reusing Float8DynamicActivationFloat8WeightConfig for the above, and use kernel to control which kernel
users will use
Test Plan:
python test/dtypes/test_affine_quantized_float.py
python test/quantization/quantize_/test_float8_rowwise_tensor.py
Reviewers:
Subscribers:
Tasks:
Tags: