[Bugfix]: Correct handling of cos_sin_cache length #1900

jianzs · 2025-07-21T03:44:32Z

What this PR does / why we need it?

This PR addresses the performance issue related to cos/sin cache handling:

The cos/sin cache is already initialized with the maximum context length during initialization. However, due to max_seq_len_cache being stored as seq_len, the condition check was incorrect, leading to unnecessary cache recreation.
Since the cos/sin cache is already initialized with maximum context length, it should not trigger recreation during the process.
Fixed variable naming: max_seq_len_cache was never used and should be max_seq_len. This also is the correct variable to check against the maximum context length.

Does this PR introduce any user-facing change?

No

How was this patch tested?

CI pass.

vLLM version: v0.9.2
vLLM main: vllm-project/vllm@92615d7

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

jianzs · 2025-07-21T03:44:53Z

@whx-sjtu PTAL

Copilot

Pull Request Overview

This PR fixes performance issues in rotary embedding cos/sin cache handling by correcting variable usage and preventing unnecessary cache recreation. The fix ensures that the cache, which is already initialized with maximum context length, is not unnecessarily recreated during processing.

Replaces cache recreation logic with an error when max_seq_len exceeds the initialized maximum
Corrects variable assignment in _set_cos_sin_cache from max_seq_len_cached to max_seq_len
Removes redundant max_seq_len assignment during initialization since the cache setup handles this

vllm_ascend/ops/rotary_embedding.py

whx-sjtu

I reviewed codes of branch v0.9.1-dev and found that this problem has already been solved in that branch while hasn't been ported to main. Thanks for finding and fixing this. LGTM.

…alid inputs Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

codecov · 2025-07-21T05:29:24Z

Codecov Report

Attention: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 60.21%. Comparing base (8cfd257) to head (ca576d5).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/ops/rotary_embedding.py	50.00%	1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (75.00%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1900      +/-   ##
==========================================
+ Coverage   60.17%   60.21%   +0.04%     
==========================================
  Files          71       71              
  Lines        7989     7995       +6     
==========================================
+ Hits         4807     4814       +7     
+ Misses       3182     3181       -1

Flag	Coverage Δ
unittests	`60.21% <75.00%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ApsarasX · 2025-07-21T06:11:19Z

I reviewed codes of branch v0.9.1-dev and found that this problem has already been solved in that branch while hasn't been ported to main. Thanks for finding and fixing this. LGTM.

@whx-sjtu Which PR fixed this issue in the 0.9.1-dev branch?

weijinqian0 · 2025-07-21T08:00:32Z

vllm_ascend/ops/rotary_embedding.py

@@ -209,7 +211,7 @@ def apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim=1):


 def _set_cos_sin_cache(self, seq_len, device, dtype):
-    self.max_seq_len_cached = seq_len
+    self.max_seq_len = seq_len * self.scaling_factor


There is no problem in v0.9.1. what happens.

#1551 fixed this problem in v0.9.1-dev

jianzs · 2025-07-21T09:00:55Z

I reviewed codes of branch v0.9.1-dev and found that this problem has already been solved in that branch while hasn't been ported to main. Thanks for finding and fixing this. LGTM.

@whx-sjtu Which PR fixed this issue in the 0.9.1-dev branch?

@ApsarasX #1551

fix: correct handling of cos_sin_cache length

b27d6f1

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

github-actions bot added the module:ops label Jul 21, 2025

jianzs requested review from ganyi1996ppo and Copilot July 21, 2025 03:45

Copilot AI reviewed Jul 21, 2025

View reviewed changes

vllm_ascend/ops/rotary_embedding.py Show resolved Hide resolved

whx-sjtu approved these changes Jul 21, 2025

View reviewed changes

test: update native_rope_deepseek_forward to raise ValueError for inv…

5308ef9

…alid inputs Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

jianzs requested review from Yikun, wangxiyuan and ApsarasX July 21, 2025 04:11

jianzs added performance-test enable performance test for PR accuracy-test enable all accuracy test for PR ready-for-test start test by label for PR labels Jul 21, 2025

github-actions bot added the module:tests label Jul 21, 2025

chore: lint

ca576d5

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

jianzs added the ready read for review label Jul 21, 2025

weijinqian0 reviewed Jul 21, 2025

View reviewed changes

ApsarasX approved these changes Jul 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix]: Correct handling of cos_sin_cache length #1900

[Bugfix]: Correct handling of cos_sin_cache length #1900

Uh oh!

jianzs commented Jul 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

jianzs commented Jul 21, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

whx-sjtu left a comment

Uh oh!

codecov bot commented Jul 21, 2025 •

edited

Loading

Uh oh!

ApsarasX commented Jul 21, 2025

Uh oh!

weijinqian0 Jul 21, 2025

Uh oh!

jianzs Jul 21, 2025

Uh oh!

jianzs commented Jul 21, 2025

Uh oh!

Uh oh!

[Bugfix]: Correct handling of cos_sin_cache length #1900

Are you sure you want to change the base?

[Bugfix]: Correct handling of cos_sin_cache length #1900

Uh oh!

Conversation

jianzs commented Jul 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

jianzs commented Jul 21, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

whx-sjtu left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ApsarasX commented Jul 21, 2025

Uh oh!

weijinqian0 Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

jianzs Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

jianzs commented Jul 21, 2025

Uh oh!

Uh oh!

jianzs commented Jul 21, 2025 •

edited by github-actions bot

Loading

codecov bot commented Jul 21, 2025 •

edited

Loading