Skip to content

[V1][Spec-decode] First Stage support of Eagle 1 #1128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

umeiko
Copy link

@umeiko umeiko commented Jun 9, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

  • No user-facing changes yet (experimental implementation)
  • Will add configuration options as same as vllm does.

How was this patch tested?

  • Basic functionality verification with synthetic data
  • Planning more comprehensive benchmarks in subsequent work

@umeiko umeiko force-pushed the eagle branch 2 times, most recently from bf14ebd to 2765391 Compare June 9, 2025 07:20
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@JSYRD
Copy link

JSYRD commented Jun 23, 2025

Hello, also found eagle is still not working with v0.9.0rc2 ( looks correctly loaded but will crash), however there's a patch in /vllm_ascend/patch/worker/patch_common/patch_eagle.py, and seems that this patch might be used to support eagle in vllm-ascend(and obviously it's not working well). The problem is, is it a ideal way to implement eagle in vllm-ascend in the way like your pr(implementing a new worker), or like the patch? Not very professional in vllm-ascend developing, and still wondering the differences between those files in the worker and the patch/worker 😂

@umeiko
Copy link
Author

umeiko commented Jun 23, 2025

  • LLM: Meta-Llama-3.1-8B-Instruct
  • Drafter: EAGLE-LLaMA3.1-Instruct-8B
  • Dataset: mt-bench (downloaded from the eagle repo )
  • num_spec_tokens 2
  • vllm : matser branch
  • Device: Single 910B3 , 65536 MB
========test_matrix==========
total_counts: 8994
position_0: 8994 times (1.00)
position_1: 5680 times (0.63)
position_2: 2854 times (0.32)
mean acceptance length: 1.2661774516344229

@umeiko
Copy link
Author

umeiko commented Jun 23, 2025

Hello, also found eagle is still not working with v0.9.0rc2 ( looks correctly loaded but will crash), however there's a patch in /vllm_ascend/patch/worker/patch_common/patch_eagle.py, and seems that this patch might be used to support eagle in vllm-ascend(and obviously it's not working well). The problem is, is it a ideal way to implement eagle in vllm-ascend in the way like your pr(implementing a new worker), or like the patch? Not very professional in vllm-ascend developing, and still wondering the differences between those files in the worker and the patch/worker 😂

I updated this commit after eagle3 pr is ready. This brach has been tested on v0.9.1 locally.

@JSYRD
Copy link

JSYRD commented Jun 23, 2025

Hello, also found eagle is still not working with v0.9.0rc2 ( looks correctly loaded but will crash), however there's a patch in /vllm_ascend/patch/worker/patch_common/patch_eagle.py, and seems that this patch might be used to support eagle in vllm-ascend(and obviously it's not working well). The problem is, is it a ideal way to implement eagle in vllm-ascend in the way like your pr(implementing a new worker), or like the patch? Not very professional in vllm-ascend developing, and still wondering the differences between those files in the worker and the patch/worker 😂

I updated this commit after eagle3 pr is ready. This brach has been tested on v0.9.1 locally.

Got that. But as the same question mentioned above, it seems there's already an implementation of eagle1, and is it waited to be fixed? or your version just implemented another in a different way? Thanks for your reply~

@umeiko umeiko force-pushed the eagle branch 2 times, most recently from f8fd4eb to 739728c Compare June 23, 2025 07:29
@umeiko
Copy link
Author

umeiko commented Jun 23, 2025

Hello, also found eagle is still not working with v0.9.0rc2 ( looks correctly loaded but will crash), however there's a patch in /vllm_ascend/patch/worker/patch_common/patch_eagle.py, and seems that this patch might be used to support eagle in vllm-ascend(and obviously it's not working well). The problem is, is it a ideal way to implement eagle in vllm-ascend in the way like your pr(implementing a new worker), or like the patch? Not very professional in vllm-ascend developing, and still wondering the differences between those files in the worker and the patch/worker 😂

I updated this commit after eagle3 pr is ready. This brach has been tested on v0.9.1 locally.

Got that. But as the same question mentioned above, it seems there's already an implementation of eagle1, and is it waited to be fixed? or your version just implemented another in a different way? Thanks for your reply~

yes. eagle 1 support have not been finished in v1 kernel before. this pr fixes this.

Copy link

codecov bot commented Jun 23, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 49.92%. Comparing base (c30ddb8) to head (1296606).
Report is 75 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1128       +/-   ##
===========================================
+ Coverage   27.39%   49.92%   +22.53%     
===========================================
  Files          56       76       +20     
  Lines        6191     9231     +3040     
===========================================
+ Hits         1696     4609     +2913     
- Misses       4495     4622      +127     
Flag Coverage Δ
unittests 49.92% <ø> (+22.53%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@umeiko
Copy link
Author

umeiko commented Jun 23, 2025

Serving Scripts:

MODEL=/mnt/models/Meta-Llama-3.1-8B-Instruct
EAGEL1DRAFT=/mnt/models/EAGLE-LLaMA3.1-Instruct-8B

DRAFT=$EAGEL1DRAFT

VLLM_USE_V1=1 python -m vllm.entrypoints.openai.api_server\
    --host 0.0.0.0 --port 8001 --model $MODEL --enforce_eager \
    --seed 42 -tp 1 --gpu_memory_utilization 0.8 --max_num_seqs 1 --max_model_len 1024\
    --speculative_config '{ "method": "eagle", "model": "'$DRAFT'", "draft_tensor_parallel_size": 1, "num_speculative_tokens": 2, "max_model_len": 1024 }'

Serving Benchmark On Meta-Llama-3.1-8B-Instruct

Dataset ShareGPT_V4.3_unfiltered_cleaned_split.json , 46 prompts.

  • without SpecDecode
============ Serving Benchmark Result ============
Successful requests:                     46        
Benchmark duration (s):                  297.60    
Total input tokens:                      8177      
Total generated tokens:                  8608      
Request throughput (req/s):              0.15      
Output token throughput (tok/s):         28.92     
Total Token throughput (tok/s):          56.40     
---------------Time to First Token----------------
Mean TTFT (ms):                          48.65     
Median TTFT (ms):                        45.49     
P99 TTFT (ms):                           74.90     
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          34.37     
Median TPOT (ms):                        34.40     
P99 TPOT (ms):                           35.27     
---------------Inter-token Latency----------------
Mean ITL (ms):                           34.49     
Median ITL (ms):                         34.38     
P99 ITL (ms):                            36.78     
==================================================
  • with eagle1
============ Serving Benchmark Result ============
Successful requests:                     46        
Benchmark duration (s):                  222.79    
Total input tokens:                      8177      
Total generated tokens:                  8608      
Request throughput (req/s):              0.21      
Output token throughput (tok/s):         38.64     
Total Token throughput (tok/s):          75.34     
---------------Time to First Token----------------
Mean TTFT (ms):                          56.12     
Median TTFT (ms):                        52.86     
P99 TTFT (ms):                           83.48     
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          27.46     
Median TPOT (ms):                        25.95     
P99 TPOT (ms):                           42.74     
---------------Inter-token Latency----------------
Mean ITL (ms):                           47.15     
Median ITL (ms):                         47.07     
P99 ITL (ms):                            49.69     
==================================================

@mengwei805
Copy link
Collaborator

mengwei805 commented Jun 23, 2025

Issues that must be resolved: Please delete the eagle code that @ponix-j did not complete in #874

@@ -117,8 +117,6 @@ def test_eagle_correctness(
Compare the outputs of a original LLM and a speculative LLM
should be the same when using eagle speculative decoding.
'''
if not use_eagle3:
pytest.skip("Not current support for the test.")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests for eagle1 was opened here

@umeiko
Copy link
Author

umeiko commented Jun 24, 2025

Issues that must be resolved: Please delete the eagle code that @ponix-j did not complete in #874

we made a single pr to fix this #1385

@mengwei805
Copy link
Collaborator

Issues that must be resolved: Please delete the eagle code that @ponix-j did not complete in #874

we made a single pr to fix this #1385

approve

@mengwei805 mengwei805 added long-term-test enable long term test for PR accuracy-test enable all accuracy test for PR ready-for-test start test by label for PR and removed long-term-test enable long term test for PR ready-for-test start test by label for PR accuracy-test enable all accuracy test for PR labels Jun 25, 2025
@umeiko umeiko force-pushed the eagle branch 2 times, most recently from 9e10169 to c3f1605 Compare June 25, 2025 08:52
@mengwei805 mengwei805 added long-term-test enable long term test for PR ready-for-test start test by label for PR and removed long-term-test enable long term test for PR ready-for-test start test by label for PR labels Jun 26, 2025
@umeiko umeiko force-pushed the eagle branch 2 times, most recently from fef8884 to e255af5 Compare June 26, 2025 08:14
@umeiko
Copy link
Author

umeiko commented Jun 26, 2025

longterm CI currently not working : #1444

pytest -sv tests/e2e/long_term/spec_decode_v0 --ignore=tests/e2e/long_term/spec_decode_v0/e2e/test_mtp_correctness.py
# TODO: revert when test_v1_mtp_correctness.py is fixed
# VLLM_USE_MODELSCOPE=True pytest -sv tests/e2e/long_term/spec_decode_v0/e2e/test_mtp_correctness.py # it needs a clean process
# pytest -sv tests/e2e/long_term/spec_decode_v0 --ignore=tests/e2e/long_term/spec_decode_v0/e2e/test_mtp_correctness.py
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deepseek_mtp CI need to be skip.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! you can just remove the comment to enable it

@Yikun Yikun added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Jun 30, 2025
@umeiko umeiko force-pushed the eagle branch 3 times, most recently from a165985 to 41d8b17 Compare July 2, 2025 06:19
@MengqingCao MengqingCao added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Jul 3, 2025
Copy link

github-actions bot commented Jul 4, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions github-actions bot added merge-conflicts and removed ready read for review labels Jul 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
long-term-test enable long term test for PR merge-conflicts module:tests ready-for-test start test by label for PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants