Triton Model Navigator v0.14.0

Latest

Latest

kacper-kleczewski released this 22 Apr 16:54

2d1b909

Updates:
- new: TensorRT INT8 and FP8 quantization through ModelOpt (ONNX path)
- new: TensorRT NVFP4 quantization through ModelOpt (Torch path)
- new: Improved TorchCompile performance for repeated compilations using TORCHINDUCTOR_CACHE_DIR environment variable
- new: Global context with scoped variables - temporary context variables
- new: Added new context variables INPLACE_OPTIMIZE_WORKSPACE_CONTEXT_KEY and INPLACE_OPTIMIZE_MODULE_GRAPH_ID_CONTEXT_KEY
- new: nav.bundle.save now has include and exclude patterns for fine grained files selection
- new: GPU and Host memory usage logging
- change: Install the TensorRT package for architectures other than x86_64
- change: Disable conversion fallback for TensorRT paths and expose control option in custom config
- change: Use torch.export.save for Torch-TRT model serialization
- change: Added export_engine to OnnxConfig for improved export control
- fix: Correctness command relative tolerance formula
- fix: Memory management during export and conversion process for Torch

Version of external components used during testing:
- PyTorch 2.7.0a0+7c8ec84dab
- TensorFlow 2.17.0
- TensorRT 10.9.0.34
- TensorRT ModelOptimizer 0.27.0
- Torch-TensorRT 2.7.0a0
- ONNX Runtime 1.20.2
- Polygraphy 0.49.20
- GraphSurgeon 0.5.8
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions.
  See its support matrix
  for a detailed summary.

Assets 3