v0.20.0rc1
Pre-release
Pre-release
Highlights
- Features
What's Changed
- move pytorch tests of LLM API into separate test files by @QiJune in #3745
- Fix double link to fp8_blockscale_gemm_src by @WilliamTambellini in #3707
- feat: add QMMA-based MLA kernels by @PerkzZheng in #3752
- chore: add pull request template by @byshiue in #3760
- Add running E2E LoRA flow by @shaharmor98 in #3648
- [infra] Waive L0 tests by @yiqingy0 in #3784
- feat: Add smart router for moe module by @zongfeijing in #3641
- test: add rcca tests 4753548 by @xinhe-nv in #3716
- fix: nvbugs/5234029 fix Qwen2.5-VL image test by @yechank-nvidia in #3726
- test: [CI] Add failed cases into waives.txt by @xinhe-nv in #3696
- fix: Intercept the error of multi-ranks bound to a single device by @Shixiaowei02 in #3525
- fix: remove the unnecessary metadata changes in mtp by @lfr-0531 in #3787
- test: Add DeepSeek-V3-Lite GSM8K tests by @syuoni in #3771
- infra: [TRTLLM-4417]Support auto trigger special test stage for special file change by @ZhanruiSunCh in #3478
- [TRTLLM-4763][test] Accuracy test improvement (Part 3.6): Deprecate mmlu_llmapi.py by @syuoni in #3802
- add passing E2E LoRA flow by @shaharmor98 in #3788
- fix: Limit llama4 context length to 8k by @mikeiovine in #3778
- fix: Fix C++ decoder synchronization in PyTorch by @dcampora in #3106
- fix: 5197419 and removed unused runtime kernels by @hypdeb in #3631
- chore: reorganize some unit tests of PyTorch by @QiJune in #3780
- doc: fix path after examples migration by @kaiyux in #3814
- chore: fix some invalid paths of contrib models by @QiJune in #3818
- chore: Fix KV cache block reuse flag name in quickstart_advanced by @mikeiovine in #3781
- Fix create_weights in attention by @hlu1 in #3692
- test: [CI] Add failed cases into waives.txt by @xinhe-nv in #3777
- [https://nvbugspro.nvidia.com/bug/5238602][fix] Package lm_eval configuration files by @syuoni in #3809
- [https://nvbugspro.nvidia.com/bug/5238599][fix] Normalize example path in accuracy tests by @syuoni in #3805
- fix: Set default prompts and media for multimodal quickstart example by @qixiang-99 in #3792
- Fix: Revert commit 25f9669 by @Shixiaowei02 in #3832
- chore: bump version to 0.20.0rc1 by @ZhanruiSunCh in #3834
- feat(part 2): Enhance the integrated robustness of scaffolding with init_.py #3305 by @WeiHaocheng in #3731
- fix: fix lora case failure by @HuiGao-NV in #3838
- Added NemotronH to PyTorch supported models by @vegaluisjose in #3663
- Adding local paths to the datasets to make them loadable in offline mode by @rakib-hasan in #3750
- fix: [Deepseek] Pass hidden_states_fp4 to shared_experts by @hlu1 in #3819
- chore: increase A30 for cpp test by @QiJune in #3811
- feat: Return logits in PyTorch flow by @tongyuantongyu in #3221
- feat: large-scale EP(part 1: Add MNNVL MoE A2A support) by @dongxuy04 in #3504
- [infra] Waive L0 tests by @yiqingy0 in #3853
- [chore] Add Llama 4 Maverick to quickstart README by @mikeiovine in #3848
- fix:[AutoDeploy] Patch for torch load_state_dict() by @sugunav14 in #3847
- feat: Add head size 72 support for QKV Preprocessing kernel by @qixiang-99 in #3743
- chore: update pytorch only change file list by @QiJune in #3873
- Test: Split C++ unit tests for CI granularity by @DomBrown in #3868
- TRTLLM-4875 feat: Add version switcher to doc by @kaiyux in #3846
Full Changelog: v0.20.0rc0...v0.20.0rc1