[V0.9.1] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance #1545

rjg-lyh · 2025-07-01T03:36:02Z

What this PR does / why we need it?

Optimizes the performance of the Qwen3 quantization model by registering a custom model and adding the AddRmsNormQuant operation. Subsequent PRs will focus on performance optimizations based on this custom model.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI passed with existing test.

ganyi1996ppo · 2025-07-02T03:59:39Z

vllm_ascend/quantization/w8a8.py

@@ -59,6 +60,7 @@ def get_pertensor_param(params_dtype: torch.dtype) -> Dict[str, Any]:
        params_dict = {}
        params_dict["input_scale"] = torch.empty(1, dtype=params_dtype)
        params_dict["input_offset"] = torch.empty(1, dtype=torch.int8)
+        AscendW8A8LinearMethod.params_dtype = params_dtype


What if there is a fp16 fallback？Then how dose that fallback linear do the calculation

I don't think its a good solution, can we write this dtype back to the param_dict and inject it into the layer eventually?

I have fix it.

ganyi1996ppo · 2025-07-02T04:05:45Z

vllm_ascend/models/qwen3.py

+        if quant_config is not None:
+            from vllm_ascend.quantization.quant_config import AscendQuantConfig
+            assert isinstance(quant_config, AscendQuantConfig)
+            self.input_layernorm = AddRMSNormQuant(config.hidden_size,


Just discussed with @realliujiaxu , this behaviour is not a general way to apply our optimization in modeling, can we try to leverage the compilation path in vllm to fuse ops in fx graph? cc @jianzs @Yikun @wangxiyuan

I'm fine with the changes in this PR, but I believe we also need an ultimate solution to handle this kind of problem for good.

realliujiaxu · 2025-07-07T13:18:08Z

vllm_ascend/ops/layernorm.py

+        import torch_npu
+
+        if residual is not None:
+            x, _, residual = torch_npu.npu_add_rms_norm_quant(


does torch_npu.npu_add_rms_norm_quant require a newer version of torch_npu?

Now version of PTA has supported it.

…3's performance Signed-off-by: rjg-lyh <1318825571@qq.com>

github-actions bot added module:ops module:quantization labels Jul 1, 2025

rjg-lyh force-pushed the pr-addrmsnorm-quant branch 2 times, most recently from 68937f8 to a73ed88 Compare July 1, 2025 07:23

ganyi1996ppo reviewed Jul 2, 2025

View reviewed changes

rjg-lyh force-pushed the pr-addrmsnorm-quant branch 2 times, most recently from 3384ed1 to 07736ef Compare July 4, 2025 10:00

realliujiaxu reviewed Jul 7, 2025

View reviewed changes

rjg-lyh force-pushed the pr-addrmsnorm-quant branch 3 times, most recently from 0137a7d to 1b898e7 Compare July 9, 2025 08:00

[V0.9.1] Use AddRmsNormQuant ops in the custom model to optimize Qwen…

4def25f

…3's performance Signed-off-by: rjg-lyh <1318825571@qq.com>

rjg-lyh force-pushed the pr-addrmsnorm-quant branch from 1b898e7 to 4def25f Compare July 9, 2025 08:06

ganyi1996ppo approved these changes Jul 9, 2025

View reviewed changes

ganyi1996ppo merged commit 279fccd into vllm-project:v0.9.1-dev Jul 9, 2025
16 checks passed

Yikun added no-main no-test labels Jul 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[V0.9.1] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance #1545

[V0.9.1] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance #1545

rjg-lyh commented Jul 1, 2025

Uh oh!

ganyi1996ppo Jul 2, 2025

Uh oh!

ganyi1996ppo Jul 2, 2025

Uh oh!

rjg-lyh Jul 9, 2025

Uh oh!

ganyi1996ppo Jul 2, 2025 •

edited

Loading

Uh oh!

ganyi1996ppo Jul 2, 2025

Uh oh!

realliujiaxu Jul 7, 2025

Uh oh!

rjg-lyh Jul 9, 2025

Uh oh!

Uh oh!

Uh oh!

[V0.9.1] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance #1545

[V0.9.1] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance #1545

Conversation

rjg-lyh commented Jul 1, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

ganyi1996ppo Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

rjg-lyh Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

realliujiaxu Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

rjg-lyh Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ganyi1996ppo Jul 2, 2025 •

edited

Loading