Add Float8Tensor #2463

jerryzh168 · 2025-06-30T23:01:24Z

Stacked PRs:

Add Float8Tensor

Summary:
Added Float8Tensor that works for:

fbgemm: per row activation + per row weight calling torch.ops.fbgemm.f8f8bf16_rowwise kernela
aten: per row/tensor activation + per row/tensor weight calling torch._scaled_mm, or weight only quantization (fallback path)

Reusing Float8DynamicActivationFloat8WeightConfig for the above, and use kernel to control which kernel
users will use

Test Plan:
python test/dtypes/test_affine_quantized_float.py
python test/quantization/quantize_/test_float8_rowwise_tensor.py

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2025-06-30T23:01:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2463

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Unrelated Failure

As of commit 032425d with merge base 378e179 ():

NEW FAILURES - The following jobs have failed:

Run Float8 Tests / test (H100, linux.aws.h100, --pre torch torchvision torchaudio --index-url https://download.pytor... / linux-job (gh)
test/integration/test_integration.py::TestExport::test_export_float8
Run Float8 Tests / test (SM-89, linux.g6.4xlarge.experimental.nvidia.gpu, --pre torch --index-url https://download.p... / linux-job (gh)
test/integration/test_integration.py::TestExport::test_export_float8

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Splits out the float8 rowwise quantized path (both act and weight) of AQT to Float8RowwiseTensor Next: could potentially incorporate the per tensor activation path there as well Next: we can split the per tensor weight path to another Tensor as well, so we can deprecate AQT path for float8 Test Plan: python test/dtypes/test_affine_quantized_float.py python test/quantization/quantize_/test_float8_rowwise_tensor.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2463, branch: jerryzh168/stack/9

torchao/quantization/quantize_/float8_tensor.py

torchao/quantization/quant_api.py

torchao/quantization/quantize_/float8_tensor.py

drisspg · 2025-07-08T01:53:43Z

test/dtypes/test_affine_quantized_float.py

-                )
-                new_weight = new_layer.weight.tensor_impl.float8_data.to(torch.float32)
-            else:
+            if mode == "static":


what does that practically mean, is static_quant no broken after this PR

static quant is not migrated yet, it won't break

test/dtypes/test_affine_quantized_float.py

vkuzo · 2025-07-08T11:55:49Z

test/integration/test_serialization_bc.py

+from torchao.utils import _is_fbgemm_genai_gpu_available, is_sm_at_least_90
+
+_MODEL_NAMES = [
+    "torchao-testing/opt-125m-float8dq-row-fbgemm",


I think model name here should specify the relevant versions

also, IMO this should be a toy model with a single layer with matching done on the layer output, to make it 100x easier to debug when things do go wrong. It's fine to also have a real model and match tokens, but I think it's more important to have a toy model.

single linear for debugability makes sense, although I'm not sure how we can get a toy model with a single linear in huggingface transformers actually, I can add version but can revisit on getting a single layer

torchao/core/config.py

torchao/quantization/quantize_/float8/float8_tensor.py

Summary: Added Float8Tensor that works for: * fbgemm: per row activation + per row weight calling torch.ops.fbgemm.f8f8bf16_rowwise kernela * aten: per row/tensor activation + per row/tensor weight calling torch._scaled_mm, or weight only quantization (fallback path) Reusing Float8DynamicActivationFloat8WeightConfig for the above, and use kernel to control which kernel users will use Test Plan: python test/dtypes/test_affine_quantized_float.py python test/quantization/quantize_/test_float8_rowwise_tensor.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2463, branch: jerryzh168/stack/9

jerryzh168 force-pushed the jerryzh168/stack/9 branch from da79207 to 5cae4d0 Compare June 30, 2025 23:01

This was referenced Jun 30, 2025

Add support for Int4GroupwisePreshuffleTensor for fbgemm #2421

Merged

Remove transpose_input from fbgemm configs #2422

Merged

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 30, 2025

jerryzh168 mentioned this pull request Jun 30, 2025

Add support for float8 activation for Int4PreshuffledTensor #2437

Merged

jerryzh168 added the topic: new feature Use this tag if this PR adds a new feature label Jun 30, 2025

jerryzh168 changed the base branch from jerryzh168/stack/4 to main July 2, 2025 01:58

jerryzh168 force-pushed the jerryzh168/stack/9 branch from 5cae4d0 to 33ca58e Compare July 2, 2025 01:58

jerryzh168 changed the title ~~Add Float8RowwiseTensor~~ Add Float8Tensor Jul 2, 2025

jerryzh168 changed the base branch from main to jerryzh168/stack/4 July 2, 2025 01:58

jerryzh168 mentioned this pull request Jul 2, 2025

Add all fbgemm kernel Tensors into Int4WeightOnlyConfig and Float8DynamicActivationInt4WeightConfig #2474

Open

vkuzo reviewed Jul 2, 2025

View reviewed changes

torchao/quantization/quantize_/float8_tensor.py Outdated Show resolved Hide resolved

vkuzo reviewed Jul 2, 2025

View reviewed changes

torchao/quantization/quant_api.py Outdated Show resolved Hide resolved

vkuzo reviewed Jul 2, 2025

View reviewed changes

torchao/quantization/quantize_/float8_tensor.py Outdated Show resolved Hide resolved

jerryzh168 changed the base branch from jerryzh168/stack/4 to main July 2, 2025 20:35

jerryzh168 force-pushed the jerryzh168/stack/9 branch from 33ca58e to 897ec7e Compare July 2, 2025 20:36

jerryzh168 changed the base branch from main to jerryzh168/stack/4 July 2, 2025 20:36

jerryzh168 changed the base branch from jerryzh168/stack/4 to main July 2, 2025 21:42

jerryzh168 force-pushed the jerryzh168/stack/9 branch 2 times, most recently from 7897dcf to 99a1bb1 Compare July 2, 2025 21:42

jerryzh168 mentioned this pull request Jul 2, 2025

Rename torchao.float8.Float8Tensor to torchao.float8.Float8TrainingTensor #2479

Merged

jerryzh168 changed the base branch from main to jerryzh168/stack/11 July 2, 2025 21:42

jerryzh168 changed the base branch from jerryzh168/stack/11 to main July 2, 2025 23:44

jerryzh168 force-pushed the jerryzh168/stack/9 branch from 99a1bb1 to 7e9f224 Compare July 2, 2025 23:44

jerryzh168 changed the base branch from main to jerryzh168/stack/11 July 2, 2025 23:44

jerryzh168 changed the base branch from jerryzh168/stack/11 to main July 3, 2025 00:09

jerryzh168 force-pushed the jerryzh168/stack/9 branch from 7e9f224 to 442bd6c Compare July 3, 2025 00:09

jerryzh168 changed the base branch from main to jerryzh168/stack/11 July 3, 2025 00:09

drisspg reviewed Jul 8, 2025

View reviewed changes

test/dtypes/test_affine_quantized_float.py Outdated Show resolved Hide resolved

jerryzh168 changed the base branch from jerryzh168/stack/11 to main July 8, 2025 02:27

jerryzh168 force-pushed the jerryzh168/stack/9 branch from 4bb1c40 to 1190e05 Compare July 8, 2025 02:27

jerryzh168 changed the base branch from main to jerryzh168/stack/11 July 8, 2025 02:27