Skip to content

Spec decode support for V1 Engine #874

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

ponix-j
Copy link

@ponix-j ponix-j commented May 15, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

@ponix-j ponix-j changed the title Spec v0.8.5rc1 Spec decode v0.8.5rc1 May 15, 2025
@@ -0,0 +1,587 @@
# SPDX-License-Identifier: Apache-2.0
from typing import Optional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rejection_sampler and eagle_proposer fully overwrite,
Can't this be implemented in a way that the main body uses vllm code and the problematic part is patched in vllm_ascend?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use class AscendRejectionSampler Inheritance class RejectionSampler,Extracting the part of the change

@mengwei805
Copy link
Collaborator

spec decode is key feature, can u add ngram and eagle e2e ut in vllm-ascend?

@ponix-j
Copy link
Author

ponix-j commented May 16, 2025

spec decode is key feature, can u add ngram and eagle e2e ut in vllm-ascend?

yes,already added test_spec_decode.py

@mengwei805
Copy link
Collaborator

pls rebase u all commits to 1 commit

@ponix-j ponix-j force-pushed the spec_v0.8.5rc1 branch 2 times, most recently from 9f5ae8a to ed32fcf Compare May 19, 2025 04:04
@github-actions github-actions bot added documentation Improvements or additions to documentation module:ops module:quantization labels May 19, 2025
@github-actions github-actions bot removed documentation Improvements or additions to documentation module:ops module:quantization labels May 19, 2025
@ponix-j ponix-j force-pushed the spec_v0.8.5rc1 branch 2 times, most recently from 9e7bc8b to aa2bb74 Compare May 20, 2025 01:16

@pytest.fixture
def model_name():
return "meta-llama/Meta-Llama-3-8B-Instruct"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use modelscope instead

Suggested change
return "meta-llama/Meta-Llama-3-8B-Instruct"
return "LLM-Research/Meta-Llama-3.1-8B-Instruct"


@pytest.fixture
def eagle_model_name():
return "yuhuili/EAGLE-LLaMA3-Instruct-8B"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -0,0 +1,70 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not recommand to make a seperate patch dir for spec-decode. Let's move these patch to vllm_ascend/patch/platform or vllm_ascend/patch/worker, depending on which period is appropriate.

And the most important, please make comments in vllm_ascend/patch/__init__.py to describe why we make this patch

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allready fixed

# a, a + 1, ..., a + b - n2 - 1,
# a + b, a + b + 1, ..., a + b + c - n3 - 1]

# [0, a, a + b, a + b + c] -> [a, b, c]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz remove the useless comments

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is copy from vllm/eagle.py, why this useless?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mistakenly thought it was a cuda-specific comment, please ignore it

@@ -100,6 +100,7 @@ def adapt_patch(is_global_patch: bool = False):
if is_global_patch:
from vllm_ascend.patch import platform # noqa: F401
else:
from vllm_ascend.patch import spec_decode # noqa: F401
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this when the above suggestions on patch is solved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already fixed

# [0, 1, 2, 5, 6, 9]
target_logits_indices += arange

# TODO: Optimize the CPU -> GPU copy.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO: Optimize the CPU -> GPU copy.
# TODO: Optimize the CPU -> NPU copy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already fixed

@@ -737,12 +888,92 @@ def execute_model(
if max_gen_len == 1:
# No spec decode tokens.
valid_sampled_token_ids = sampled_token_ids.tolist()
else:
# Includes spec decode tokens.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we extract the model execution code into a seperate function to make code clear?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already fixed

Signed-off-by: ponix-j <657511300@qq.com>
@ponix-j
Copy link
Author

ponix-j commented May 21, 2025

pls rebase u all commits to 1 commit

will use 1 commit finally

@wangxiyuan wangxiyuan changed the title Spec decode v0.8.5rc1 Spec decode support for V1 Engine May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants