Skip to content

[main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance #1806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 22, 2025

Conversation

rjg-lyh
Copy link
Contributor

@rjg-lyh rjg-lyh commented Jul 15, 2025

What this PR does / why we need it?

Optimizes the performance of the Qwen3 quantization model by registering a custom model and adding the AddRmsNormQuant operation. Subsequent PRs will focus on performance optimizations based on this custom model.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI passed with existing test.

@rjg-lyh rjg-lyh force-pushed the pr-addrmsnorm-main branch 2 times, most recently from 6ee87ad to 3ee0a48 Compare July 15, 2025 10:17
Copy link

codecov bot commented Jul 15, 2025

Codecov Report

Attention: Patch coverage is 49.38272% with 41 lines in your changes missing coverage. Please review.

Project coverage is 60.11%. Comparing base (bf25498) to head (bcbc024).
Report is 35 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/models/qwen3.py 48.43% 33 Missing ⚠️
vllm_ascend/ops/layernorm.py 27.27% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1806      +/-   ##
==========================================
+ Coverage   54.93%   60.11%   +5.18%     
==========================================
  Files          80       74       -6     
  Lines        9712     8081    -1631     
==========================================
- Hits         5335     4858     -477     
+ Misses       4377     3223    -1154     
Flag Coverage Δ
unittests 60.11% <49.38%> (+5.18%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@@ -0,0 +1,156 @@
from collections.abc import Iterable
Copy link
Collaborator

@wangxiyuan wangxiyuan Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not rewrite the model arch, if the change is only AddRMSNormW8A8Quant.

From 0.9.2, vllm support custom ops override, we can register our ops when setup vllm-ascend, take #1647 for example.

@rjg-lyh rjg-lyh force-pushed the pr-addrmsnorm-main branch from 3ee0a48 to b50de2c Compare July 18, 2025 09:56
@rjg-lyh rjg-lyh force-pushed the pr-addrmsnorm-main branch 2 times, most recently from 1b0e244 to e7c57ad Compare July 21, 2025 12:58
…s performance

Signed-off-by: rjg-lyh <1318825571@qq.com>
@rjg-lyh rjg-lyh force-pushed the pr-addrmsnorm-main branch from e7c57ad to bcbc024 Compare July 22, 2025 03:37
@wangxiyuan
Copy link
Collaborator

As we talked offline, in the future, we should contribute the arch change to vLLM instead of maintain it inner vLLM Ascend. It's should be the common abiliity

@wangxiyuan wangxiyuan merged commit 9a3bdf2 into vllm-project:main Jul 22, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants