Release v0.20.0rc1 · NVIDIA/TensorRT-LLM

Highlights

Features
- PyTorch workflow
  - Added LoRA support. (#3648) (#3788)
  - Added return logits support. (#3221)
- Part 1 of large-scale EP: Added MNNVL MoE A2A support. (#3504)
- Added smart router for the MoE module. (#3641)
- Added head size 72 support for QKV preprocessing kernel. (#3743)

What's Changed

move pytorch tests of LLM API into separate test files by @QiJune in #3745
Fix double link to fp8_blockscale_gemm_src by @WilliamTambellini in #3707
feat: add QMMA-based MLA kernels by @PerkzZheng in #3752
chore: add pull request template by @byshiue in #3760
Add running E2E LoRA flow by @shaharmor98 in #3648
[infra] Waive L0 tests by @yiqingy0 in #3784
feat: Add smart router for moe module by @zongfeijing in #3641
test: add rcca tests 4753548 by @xinhe-nv in #3716
fix: nvbugs/5234029 fix Qwen2.5-VL image test by @yechank-nvidia in #3726
test: [CI] Add failed cases into waives.txt by @xinhe-nv in #3696
fix: Intercept the error of multi-ranks bound to a single device by @Shixiaowei02 in #3525
fix: remove the unnecessary metadata changes in mtp by @lfr-0531 in #3787
test: Add DeepSeek-V3-Lite GSM8K tests by @syuoni in #3771
infra: [TRTLLM-4417]Support auto trigger special test stage for special file change by @ZhanruiSunCh in #3478
[TRTLLM-4763][test] Accuracy test improvement (Part 3.6): Deprecate mmlu_llmapi.py by @syuoni in #3802
add passing E2E LoRA flow by @shaharmor98 in #3788
fix: Limit llama4 context length to 8k by @mikeiovine in #3778
fix: Fix C++ decoder synchronization in PyTorch by @dcampora in #3106
fix: 5197419 and removed unused runtime kernels by @hypdeb in #3631
chore: reorganize some unit tests of PyTorch by @QiJune in #3780
doc: fix path after examples migration by @kaiyux in #3814
chore: fix some invalid paths of contrib models by @QiJune in #3818
chore: Fix KV cache block reuse flag name in quickstart_advanced by @mikeiovine in #3781
Fix create_weights in attention by @hlu1 in #3692
test: [CI] Add failed cases into waives.txt by @xinhe-nv in #3777
[https://nvbugspro.nvidia.com/bug/5238602][fix] Package lm_eval configuration files by @syuoni in #3809
[https://nvbugspro.nvidia.com/bug/5238599][fix] Normalize example path in accuracy tests by @syuoni in #3805
fix: Set default prompts and media for multimodal quickstart example by @qixiang-99 in #3792
Fix: Revert commit 25f9669 by @Shixiaowei02 in #3832
chore: bump version to 0.20.0rc1 by @ZhanruiSunCh in #3834
feat(part 2): Enhance the integrated robustness of scaffolding with init_.py #3305 by @WeiHaocheng in #3731
fix: fix lora case failure by @HuiGao-NV in #3838
Added NemotronH to PyTorch supported models by @vegaluisjose in #3663
Adding local paths to the datasets to make them loadable in offline mode by @rakib-hasan in #3750
fix: [Deepseek] Pass hidden_states_fp4 to shared_experts by @hlu1 in #3819
chore: increase A30 for cpp test by @QiJune in #3811
feat: Return logits in PyTorch flow by @tongyuantongyu in #3221
feat: large-scale EP(part 1: Add MNNVL MoE A2A support) by @dongxuy04 in #3504
[infra] Waive L0 tests by @yiqingy0 in #3853
[chore] Add Llama 4 Maverick to quickstart README by @mikeiovine in #3848
fix:[AutoDeploy] Patch for torch load_state_dict() by @sugunav14 in #3847
feat: Add head size 72 support for QKV Preprocessing kernel by @qixiang-99 in #3743
chore: update pytorch only change file list by @QiJune in #3873
Test: Split C++ unit tests for CI granularity by @DomBrown in #3868
TRTLLM-4875 feat: Add version switcher to doc by @kaiyux in #3846

Full Changelog: v0.20.0rc0...v0.20.0rc1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.20.0rc1

Highlights

What's Changed

Contributors

Uh oh!