❓ [Question] Manually Annotate Quantization Parameters in FX Graph #3522

patrick-botco · 2025-05-16T07:38:33Z

❓ Question

is there a way to manually annotate quantization parameters that will be respected throughout torch_tensorrt conversion (e.g. manually adding q/dq nodes, or specifying some tensor metadata) via dynamo? thank you!

patrick-botco · 2025-05-16T07:41:17Z

cc @narendasan @peri044 maybe? 🙏

narendasan · 2025-05-16T14:14:38Z

This should be possible as this is what the tensorrt model optimizer toolkit effectively does. @peri044 or @lanluo-nvidia could maybe give more specific guidance.

peri044 · 2025-05-19T19:55:22Z

We currently use NVIDIA Model optimizer toolkit which inserts quantization nodes within the torch model using quantize API

https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/9c54aa1c47871d0541801a20962996461d805162/modelopt/torch/quantization/model_quant.py#L126
https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/9c54aa1c47871d0541801a20962996461d805162/modelopt/torch/quantization/tensor_quant.py#L229-L243 (definition of custom ops which do the quantization). We have converters for these quantization custom ops (which call Q & DQ apis in TensorRT).

You can also manually insert a quantization custom op by implementing a lowering pass which adds these nodes to the torch.fx.GraphModule and implement/register a custom converter for it. You can append custom metadata to this node by updating node.meta["val"]

https://docs.pytorch.org/TensorRT/contributors/writing_dynamo_aten_lowering_passes.html (existing lowering passes)
https://docs.pytorch.org/TensorRT/contributors/dynamo_converters.html
This can be done outside Torch-TRT codebase using the decorations listed above to register your lowering pass/ converter.

Please let me know if you have any further questions.

patrick-botco · 2025-05-26T23:08:29Z

hey @peri044 , thanks for the response. i tried modelopt -> export on a simple model below. am i using this wrong or missing something obvious? im using non-strict export (strict runs into torch._dynamo.exc.Unsupported: reconstruct: UserDefinedObjectVariable(_DMAttributeManager)), but hitting ValueError: Node type mismatch; expected <class 'tuple'>, but got <class 'torch.Size'>. thanks!

import modelopt.torch.quantization as mtq
import torch
from modelopt.torch.quantization.utils import export_torch_mode


class JustAConv(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = torch.nn.Conv2d(3, 3, 3)

    def forward(self, inputs):
        return self.conv(inputs)


if __name__ == "__main__":
    model = JustAConv().to("cuda").eval()
    sample_input = torch.ones(1, 3, 224, 224).to("cuda")
    quant_cfg = mtq.INT8_DEFAULT_CFG
    mtq.quantize(
        model,
        quant_cfg,
        forward_loop=lambda model: model(sample_input),
    )

    with torch.no_grad():
        with export_torch_mode():
            exported_program = torch.export.export(model, (sample_input,), strict=False)

lanluo-nvidia · 2025-05-29T22:09:22Z

@patrick-botco I have tried your example with our latest main, when strict=False it is working as expected.
I guess your error might be related to your specific version.
Could you please let me know your version?

patrick-botco · 2025-05-30T05:03:52Z

hey @lanluo-nvidia thanks for checking! here are my pytorch and modelopt versions:

nvidia-modelopt           0.29.0
nvidia-modelopt-core      0.29.0
torch                     2.5.1

patrick-botco added the question Further information is requested label May 16, 2025

patrick-botco changed the title ~~❓ [Question] Quantization IR w/ Dynamo~~ ❓ [Question] Manually Annotate Quantization Parameters in FX Graph May 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

❓ [Question] Manually Annotate Quantization Parameters in FX Graph #3522

❓ [Question] Manually Annotate Quantization Parameters in FX Graph #3522

patrick-botco commented May 16, 2025 •

edited

Loading

patrick-botco commented May 16, 2025

Uh oh!

narendasan commented May 16, 2025

Uh oh!

peri044 commented May 19, 2025 •

edited

Loading

Uh oh!

patrick-botco commented May 26, 2025

Uh oh!

lanluo-nvidia commented May 29, 2025

Uh oh!

patrick-botco commented May 30, 2025

Uh oh!

❓ [Question] Manually Annotate Quantization Parameters in FX Graph #3522

❓ [Question] Manually Annotate Quantization Parameters in FX Graph #3522

Comments

patrick-botco commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❓ Question

patrick-botco commented May 16, 2025

Uh oh!

narendasan commented May 16, 2025

Uh oh!

peri044 commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrick-botco commented May 26, 2025

Uh oh!

lanluo-nvidia commented May 29, 2025

Uh oh!

patrick-botco commented May 30, 2025

Uh oh!

patrick-botco commented May 16, 2025 •

edited

Loading

peri044 commented May 19, 2025 •

edited

Loading