Skip to content

[Bugfix]: Correct handling of cos_sin_cache length #1900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jianzs
Copy link
Collaborator

@jianzs jianzs commented Jul 21, 2025

What this PR does / why we need it?

This PR addresses the performance issue related to cos/sin cache handling:

  1. The cos/sin cache is already initialized with the maximum context length during initialization. However, due to max_seq_len_cache being stored as seq_len, the condition check was incorrect, leading to unnecessary cache recreation.

  2. Since the cos/sin cache is already initialized with maximum context length, it should not trigger recreation during the process.

  3. Fixed variable naming: max_seq_len_cache was never used and should be max_seq_len. This also is the correct variable to check against the maximum context length.

Does this PR introduce any user-facing change?

No

How was this patch tested?

CI pass.

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
@jianzs
Copy link
Collaborator Author

jianzs commented Jul 21, 2025

@whx-sjtu PTAL

@jianzs jianzs requested review from ganyi1996ppo and Copilot July 21, 2025 03:45
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes performance issues in rotary embedding cos/sin cache handling by correcting variable usage and preventing unnecessary cache recreation. The fix ensures that the cache, which is already initialized with maximum context length, is not unnecessarily recreated during processing.

  • Replaces cache recreation logic with an error when max_seq_len exceeds the initialized maximum
  • Corrects variable assignment in _set_cos_sin_cache from max_seq_len_cached to max_seq_len
  • Removes redundant max_seq_len assignment during initialization since the cache setup handles this

Copy link
Contributor

@whx-sjtu whx-sjtu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed codes of branch v0.9.1-dev and found that this problem has already been solved in that branch while hasn't been ported to main. Thanks for finding and fixing this. LGTM.

…alid inputs

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
@jianzs jianzs requested review from Yikun, wangxiyuan and ApsarasX July 21, 2025 04:11
@jianzs jianzs added performance-test enable performance test for PR accuracy-test enable all accuracy test for PR ready-for-test start test by label for PR labels Jul 21, 2025
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Copy link

codecov bot commented Jul 21, 2025

Codecov Report

Attention: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 60.21%. Comparing base (8cfd257) to head (ca576d5).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/ops/rotary_embedding.py 50.00% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (75.00%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1900      +/-   ##
==========================================
+ Coverage   60.17%   60.21%   +0.04%     
==========================================
  Files          71       71              
  Lines        7989     7995       +6     
==========================================
+ Hits         4807     4814       +7     
+ Misses       3182     3181       -1     
Flag Coverage Δ
unittests 60.21% <75.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ApsarasX
Copy link
Collaborator

I reviewed codes of branch v0.9.1-dev and found that this problem has already been solved in that branch while hasn't been ported to main. Thanks for finding and fixing this. LGTM.

@whx-sjtu Which PR fixed this issue in the 0.9.1-dev branch?

@jianzs jianzs added the ready read for review label Jul 21, 2025
@@ -209,7 +211,7 @@ def apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim=1):


def _set_cos_sin_cache(self, seq_len, device, dtype):
self.max_seq_len_cached = seq_len
self.max_seq_len = seq_len * self.scaling_factor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no problem in v0.9.1. what happens.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1551 fixed this problem in v0.9.1-dev

@jianzs
Copy link
Collaborator Author

jianzs commented Jul 21, 2025

I reviewed codes of branch v0.9.1-dev and found that this problem has already been solved in that branch while hasn't been ported to main. Thanks for finding and fixing this. LGTM.

@whx-sjtu Which PR fixed this issue in the 0.9.1-dev branch?

@ApsarasX #1551

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accuracy-test enable all accuracy test for PR module:ops module:tests performance-test enable performance test for PR ready read for review ready-for-test start test by label for PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants