[Feat]Unquantized Linear to nz and control all nz-cast #3356

anon189Ty · 2025-10-10T03:14:02Z

What this PR does / why we need it?

Currently, when executing to the Linear layer of models in vLLM-Ascend, the weights format is ND in unquantized case and skipped ascend case.
This PR supplements the execution logic for Linear layer. We use a new global variable: VLLM_ASCEND_ENABLE_NZ. When VLLM_ASCEND_ENABLE_NZ=1 and CANN version is 8.3, the weights of the Linear layer will be converted to FRACTAL_NZ, in both unquantized case and skipped ascend case. We also use VLLM_ASCEND_ENABLE_NZ to control the existing NZ conversion, such as w8a8-quantized case.

Does this PR introduce any user-facing change?

Add a new global variable VLLM_ASCEND_ENABLE_NZ. If you want to use NZ format, you should set VLLM_ASCEND_ENABLE_NZ=1.

How was this patch tested?

vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

github-actions · 2025-10-10T03:14:10Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2025-10-10T03:15:50Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

gemini-code-assist

Code Review

This pull request introduces a new environment variable VLLM_ASCEND_ENABLE_NZ to control the conversion of weights to the FRACTAL_NZ format, which is a valuable addition for performance tuning on Ascend hardware. The changes are applied consistently across various quantization methods and models. However, I've identified a few critical issues in the test files that would prevent the test suite from running, and a potential logic bug in vllm_ascend/attention/mla_v1.py involving dead code and an incorrect format constant. These issues need to be addressed to ensure the correctness and stability of the codebase.

gemini-code-assist · 2025-10-10T03:16:09Z

tests/ut/ops/test_linear.py

+    linear = AscendReplicatedLinear(
+        input_size=16,
+        output_size=8,
+    )
+    self.assertTrue(isinstance(linear.quant_method,
+        AscendUnquantizedLinearMethod))


This code is at the class level, which will cause a NameError because self is not defined in this context. This code should be moved inside a test method, for example test_init.

Suggested change

linear = AscendReplicatedLinear(

input_size=16,

output_size=8,

)

self.assertTrue(isinstance(linear.quant_method,

AscendUnquantizedLinearMethod))

def test_init(self):

linear = AscendReplicatedLinear(

input_size=16,

output_size=8,

)

self.assertTrue(isinstance(linear.quant_method,

AscendUnquantizedLinearMethod))

tests/ut/attention/test_mla_v1.py

gemini-code-assist · 2025-10-10T03:16:09Z

vllm_ascend/attention/mla_v1.py

+            elif isinstance(layer.quant_method, AscendUnquantizedLinearMethod):
+                if getattr(layer.quant_method, "unquant_to_nz", False):
+                    layer.weight.data = torch_npu.npu_format_cast(
+                        layer.weight.data, ACL_FORMAT_FRACTAL_ND)


This block of code appears to be dead code. The condition getattr(layer.quant_method, "unquant_to_nz", False) will likely never be true because AscendUnquantizedLinearMethod.process_weights_after_loading sets self.unquant_to_nz = False.

Furthermore, even if this code were to be executed, it casts the weight to ACL_FORMAT_FRACTAL_ND, which is inconsistent with the pull request's goal of converting to FRACTAL_NZ.

If this logic is intended to be used, please correct the condition and the format. Otherwise, it should be removed.

momo609 · 2025-10-10T09:41:00Z

vllm_ascend/quantization/w8a8.py

            layer.weight.data = layer.weight.data.transpose(0, 1).contiguous()
-        layer.weight.data = torch_npu.npu_format_cast(layer.weight.data,
-                                                      ACL_FORMAT_FRACTAL_NZ)
+        if envs_ascend.VLLM_ASCEND_ENABLE_NZ:


This check can be extracted into a common function.

weijinqian0 · 2025-10-10T10:05:23Z

vllm_ascend/ops/linear.py

+                "8,3"):
+            layer.weight.data = torch_npu.npu_format_cast(
+                layer.weight.data, ACL_FORMAT_FRACTAL_NZ)
+            self.unquant_to_nz = False


may can remove this parameter

realliujiaxu · 2025-10-13T08:04:31Z

why not set NZ by default, but instead add an environment variable for control?

weijinqian0 · 2025-10-13T09:19:05Z

vllm_ascend/ops/linear_op.py



 class CustomRowParallelOp(CustomTensorParallelOp):



dont need inherit this base class

github-actions · 2025-10-13T12:21:43Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>

github-actions bot added module:tests module:ops module:core module:quantization merge-conflicts labels Oct 10, 2025

gemini-code-assist bot reviewed Oct 10, 2025

View reviewed changes

momo609 reviewed Oct 10, 2025

View reviewed changes

weijinqian0 reviewed Oct 10, 2025

View reviewed changes

anon189Ty force-pushed the unquant_nz_and_control branch from 4b07f81 to 6a3575f Compare October 11, 2025 10:07

github-actions bot removed the merge-conflicts label Oct 11, 2025

anon189Ty force-pushed the unquant_nz_and_control branch 3 times, most recently from 24113ff to eb634f2 Compare October 13, 2025 06:03

weijinqian0 reviewed Oct 13, 2025

View reviewed changes

anon189Ty force-pushed the unquant_nz_and_control branch from eb634f2 to af880aa Compare October 13, 2025 12:21

github-actions bot added the merge-conflicts label Oct 13, 2025

Unquantized Linear to nz and control all nz-cast

eebe4da

Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>

anon189Ty force-pushed the unquant_nz_and_control branch from af880aa to eebe4da Compare October 13, 2025 12:33

github-actions bot removed the merge-conflicts label Oct 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat]Unquantized Linear to nz and control all nz-cast #3356

[Feat]Unquantized Linear to nz and control all nz-cast #3356

Uh oh!

anon189Ty commented Oct 10, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 10, 2025

Uh oh!

github-actions bot commented Oct 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 10, 2025

Uh oh!

Uh oh!

gemini-code-assist bot Oct 10, 2025

Uh oh!

momo609 Oct 10, 2025

Uh oh!

weijinqian0 Oct 10, 2025 •

edited

Loading

Uh oh!

realliujiaxu commented Oct 13, 2025

Uh oh!

weijinqian0 Oct 13, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Feat]Unquantized Linear to nz and control all nz-cast #3356

Are you sure you want to change the base?

[Feat]Unquantized Linear to nz and control all nz-cast #3356

Uh oh!

Conversation

anon189Ty commented Oct 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 10, 2025

Uh oh!

github-actions bot commented Oct 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

momo609 Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

weijinqian0 Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

realliujiaxu commented Oct 13, 2025

Uh oh!

weijinqian0 Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

anon189Ty commented Oct 10, 2025 •

edited by github-actions bot

Loading

weijinqian0 Oct 10, 2025 •

edited

Loading

weijinqian0 Oct 13, 2025 •

edited

Loading