[main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance #1806

rjg-lyh · 2025-07-15T07:56:12Z

What this PR does / why we need it?

Optimizes the performance of the Qwen3 quantization model by registering a custom model and adding the AddRmsNormQuant operation. Subsequent PRs will focus on performance optimizations based on this custom model.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI passed with existing test.

vLLM version: v0.9.2
vLLM main: vllm-project/vllm@8d0a01a

codecov · 2025-07-15T10:34:16Z

Codecov Report

Attention: Patch coverage is 49.38272% with 41 lines in your changes missing coverage. Please review.

Project coverage is 60.11%. Comparing base (bf25498) to head (bcbc024).
Report is 35 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/models/qwen3.py	48.43%	33 Missing ⚠️
vllm_ascend/ops/layernorm.py	27.27%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1806      +/-   ##
==========================================
+ Coverage   54.93%   60.11%   +5.18%     
==========================================
  Files          80       74       -6     
  Lines        9712     8081    -1631     
==========================================
- Hits         5335     4858     -477     
+ Misses       4377     3223    -1154

Flag	Coverage Δ
unittests	`60.11% <49.38%> (+5.18%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

wangxiyuan · 2025-07-16T01:17:03Z

vllm_ascend/models/qwen3.py

@@ -0,0 +1,156 @@
+from collections.abc import Iterable


do not rewrite the model arch, if the change is only AddRMSNormW8A8Quant.

From 0.9.2, vllm support custom ops override, we can register our ops when setup vllm-ascend, take #1647 for example.

…s performance Signed-off-by: rjg-lyh <1318825571@qq.com>

wangxiyuan · 2025-07-22T11:02:34Z

As we talked offline, in the future, we should contribute the arch change to vLLM instead of maintain it inner vLLM Ascend. It's should be the common abiliity

github-actions bot added module:ops module:quantization labels Jul 15, 2025

rjg-lyh force-pushed the pr-addrmsnorm-main branch 2 times, most recently from 6ee87ad to 3ee0a48 Compare July 15, 2025 10:17

wangxiyuan reviewed Jul 16, 2025

View reviewed changes

rjg-lyh force-pushed the pr-addrmsnorm-main branch from 3ee0a48 to b50de2c Compare July 18, 2025 09:56

github-actions bot added the module:tests label Jul 18, 2025

rjg-lyh force-pushed the pr-addrmsnorm-main branch 2 times, most recently from 1b0e244 to e7c57ad Compare July 21, 2025 12:58

[main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3'…

bcbc024

…s performance Signed-off-by: rjg-lyh <1318825571@qq.com>

rjg-lyh force-pushed the pr-addrmsnorm-main branch from e7c57ad to bcbc024 Compare July 22, 2025 03:37

wangxiyuan approved these changes Jul 22, 2025

View reviewed changes

wangxiyuan merged commit 9a3bdf2 into vllm-project:main Jul 22, 2025
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance #1806

[main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance #1806

rjg-lyh commented Jul 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

codecov bot commented Jul 15, 2025 •

edited

Loading

Uh oh!

wangxiyuan Jul 16, 2025 •

edited

Loading

Uh oh!

wangxiyuan commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

[main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance #1806

[main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance #1806

Conversation

rjg-lyh commented Jul 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

codecov bot commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wangxiyuan Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangxiyuan commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

rjg-lyh commented Jul 15, 2025 •

edited by github-actions bot

Loading

codecov bot commented Jul 15, 2025 •

edited

Loading

wangxiyuan Jul 16, 2025 •

edited

Loading