-
Notifications
You must be signed in to change notification settings - Fork 364
❓ [Question] Manually Annotate Quantization Parameters in FX Graph #3522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @narendasan @peri044 maybe? 🙏 |
This should be possible as this is what the tensorrt model optimizer toolkit effectively does. @peri044 or @lanluo-nvidia could maybe give more specific guidance. |
We currently use NVIDIA Model optimizer toolkit which inserts quantization nodes within the torch model using quantize API
You can also manually insert a quantization custom op by implementing a lowering pass which adds these nodes to the
Please let me know if you have any further questions. |
hey @peri044 , thanks for the response. i tried modelopt -> export on a simple model below. am i using this wrong or missing something obvious? im using non-strict export (strict runs into import modelopt.torch.quantization as mtq
import torch
from modelopt.torch.quantization.utils import export_torch_mode
class JustAConv(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv = torch.nn.Conv2d(3, 3, 3)
def forward(self, inputs):
return self.conv(inputs)
if __name__ == "__main__":
model = JustAConv().to("cuda").eval()
sample_input = torch.ones(1, 3, 224, 224).to("cuda")
quant_cfg = mtq.INT8_DEFAULT_CFG
mtq.quantize(
model,
quant_cfg,
forward_loop=lambda model: model(sample_input),
)
with torch.no_grad():
with export_torch_mode():
exported_program = torch.export.export(model, (sample_input,), strict=False) |
@patrick-botco I have tried your example with our latest main, when strict=False it is working as expected. |
hey @lanluo-nvidia thanks for checking! here are my pytorch and modelopt versions:
|
Uh oh!
There was an error while loading. Please reload this page.
❓ Question
is there a way to manually annotate quantization parameters that will be respected throughout torch_tensorrt conversion (e.g. manually adding q/dq nodes, or specifying some tensor metadata) via dynamo? thank you!
The text was updated successfully, but these errors were encountered: