[V1][Spec-decode] First Stage support of Eagle 1 #1128

umeiko · 2025-06-09T07:17:45Z

What this PR does / why we need it?

Implements preliminary version of Eagle speculative decoding. [Feature]: Implement Eagle1 Acceleration on vllm-ascend #1088
Lays foundation for future performance optimizations in token generation

Does this PR introduce any user-facing change?

No user-facing changes yet (experimental implementation)
Will add configuration options as same as vllm does.

How was this patch tested?

Basic functionality verification with synthetic data
Planning more comprehensive benchmarks in subsequent work

github-actions · 2025-06-11T08:35:26Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

JSYRD · 2025-06-23T02:50:07Z

Hello, also found eagle is still not working with v0.9.0rc2 ( looks correctly loaded but will crash), however there's a patch in /vllm_ascend/patch/worker/patch_common/patch_eagle.py, and seems that this patch might be used to support eagle in vllm-ascend(and obviously it's not working well). The problem is, is it a ideal way to implement eagle in vllm-ascend in the way like your pr(implementing a new worker), or like the patch? Not very professional in vllm-ascend developing, and still wondering the differences between those files in the worker and the patch/worker 😂

umeiko · 2025-06-23T06:38:37Z

LLM: Meta-Llama-3.1-8B-Instruct
Drafter: EAGLE-LLaMA3.1-Instruct-8B
Dataset: mt-bench (downloaded from the eagle repo )
num_spec_tokens 2
vllm : matser branch
Device: Single 910B3 , 65536 MB

========test_matrix==========
total_counts: 8994
position_0: 8994 times (1.00)
position_1: 5680 times (0.63)
position_2: 2854 times (0.32)
mean acceptance length: 1.2661774516344229

umeiko · 2025-06-23T07:15:48Z

Hello, also found eagle is still not working with v0.9.0rc2 ( looks correctly loaded but will crash), however there's a patch in /vllm_ascend/patch/worker/patch_common/patch_eagle.py, and seems that this patch might be used to support eagle in vllm-ascend(and obviously it's not working well). The problem is, is it a ideal way to implement eagle in vllm-ascend in the way like your pr(implementing a new worker), or like the patch? Not very professional in vllm-ascend developing, and still wondering the differences between those files in the worker and the patch/worker 😂

I updated this commit after eagle3 pr is ready. This brach has been tested on v0.9.1 locally.

JSYRD · 2025-06-23T07:19:47Z

Hello, also found eagle is still not working with v0.9.0rc2 ( looks correctly loaded but will crash), however there's a patch in /vllm_ascend/patch/worker/patch_common/patch_eagle.py, and seems that this patch might be used to support eagle in vllm-ascend(and obviously it's not working well). The problem is, is it a ideal way to implement eagle in vllm-ascend in the way like your pr(implementing a new worker), or like the patch? Not very professional in vllm-ascend developing, and still wondering the differences between those files in the worker and the patch/worker 😂

I updated this commit after eagle3 pr is ready. This brach has been tested on v0.9.1 locally.

Got that. But as the same question mentioned above, it seems there's already an implementation of eagle1, and is it waited to be fixed? or your version just implemented another in a different way? Thanks for your reply~

umeiko · 2025-06-23T07:30:49Z

Hello, also found eagle is still not working with v0.9.0rc2 ( looks correctly loaded but will crash), however there's a patch in /vllm_ascend/patch/worker/patch_common/patch_eagle.py, and seems that this patch might be used to support eagle in vllm-ascend(and obviously it's not working well). The problem is, is it a ideal way to implement eagle in vllm-ascend in the way like your pr(implementing a new worker), or like the patch? Not very professional in vllm-ascend developing, and still wondering the differences between those files in the worker and the patch/worker 😂

I updated this commit after eagle3 pr is ready. This brach has been tested on v0.9.1 locally.

Got that. But as the same question mentioned above, it seems there's already an implementation of eagle1, and is it waited to be fixed? or your version just implemented another in a different way? Thanks for your reply~

yes. eagle 1 support have not been finished in v1 kernel before. this pr fixes this.

codecov · 2025-06-23T07:59:10Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 49.92%. Comparing base (c30ddb8) to head (1296606).
Report is 75 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1128       +/-   ##
===========================================
+ Coverage   27.39%   49.92%   +22.53%     
===========================================
  Files          56       76       +20     
  Lines        6191     9231     +3040     
===========================================
+ Hits         1696     4609     +2913     
- Misses       4495     4622      +127

Flag	Coverage Δ
unittests	`49.92% <ø> (+22.53%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

umeiko · 2025-06-23T09:09:33Z

Serving Scripts:

MODEL=/mnt/models/Meta-Llama-3.1-8B-Instruct
EAGEL1DRAFT=/mnt/models/EAGLE-LLaMA3.1-Instruct-8B

DRAFT=$EAGEL1DRAFT

VLLM_USE_V1=1 python -m vllm.entrypoints.openai.api_server\
    --host 0.0.0.0 --port 8001 --model $MODEL --enforce_eager \
    --seed 42 -tp 1 --gpu_memory_utilization 0.8 --max_num_seqs 1 --max_model_len 1024\
    --speculative_config '{ "method": "eagle", "model": "'$DRAFT'", "draft_tensor_parallel_size": 1, "num_speculative_tokens": 2, "max_model_len": 1024 }'

Serving Benchmark On Meta-Llama-3.1-8B-Instruct

Dataset ShareGPT_V4.3_unfiltered_cleaned_split.json , 46 prompts.

without SpecDecode

============ Serving Benchmark Result ============
Successful requests:                     46        
Benchmark duration (s):                  297.60    
Total input tokens:                      8177      
Total generated tokens:                  8608      
Request throughput (req/s):              0.15      
Output token throughput (tok/s):         28.92     
Total Token throughput (tok/s):          56.40     
---------------Time to First Token----------------
Mean TTFT (ms):                          48.65     
Median TTFT (ms):                        45.49     
P99 TTFT (ms):                           74.90     
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          34.37     
Median TPOT (ms):                        34.40     
P99 TPOT (ms):                           35.27     
---------------Inter-token Latency----------------
Mean ITL (ms):                           34.49     
Median ITL (ms):                         34.38     
P99 ITL (ms):                            36.78     
==================================================

with eagle1

============ Serving Benchmark Result ============
Successful requests:                     46        
Benchmark duration (s):                  222.79    
Total input tokens:                      8177      
Total generated tokens:                  8608      
Request throughput (req/s):              0.21      
Output token throughput (tok/s):         38.64     
Total Token throughput (tok/s):          75.34     
---------------Time to First Token----------------
Mean TTFT (ms):                          56.12     
Median TTFT (ms):                        52.86     
P99 TTFT (ms):                           83.48     
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          27.46     
Median TPOT (ms):                        25.95     
P99 TPOT (ms):                           42.74     
---------------Inter-token Latency----------------
Mean ITL (ms):                           47.15     
Median ITL (ms):                         47.07     
P99 ITL (ms):                            49.69     
==================================================

mengwei805 · 2025-06-23T09:26:04Z

Issues that must be resolved: Please delete the eagle code that @ponix-j did not complete in #874

umeiko · 2025-06-23T09:31:38Z

tests/e2e/long_term/spec_decode_v1/test_v1_spec_decode.py

@@ -117,8 +117,6 @@ def test_eagle_correctness(
    Compare the outputs of a original LLM and a speculative LLM
    should be the same when using eagle speculative decoding.
    '''
-    if not use_eagle3:
-        pytest.skip("Not current support for the test.")


tests for eagle1 was opened here

umeiko · 2025-06-24T01:55:09Z

Issues that must be resolved: Please delete the eagle code that @ponix-j did not complete in #874

we made a single pr to fix this #1385

mengwei805 · 2025-06-24T02:13:36Z

Issues that must be resolved: Please delete the eagle code that @ponix-j did not complete in #874

we made a single pr to fix this #1385

approve

umeiko · 2025-06-26T08:35:02Z

longterm CI currently not working : #1444

umeiko · 2025-06-27T01:37:12Z

.github/workflows/vllm_ascend_test_long_term.yaml

-            pytest -sv tests/e2e/long_term/spec_decode_v0 --ignore=tests/e2e/long_term/spec_decode_v0/e2e/test_mtp_correctness.py
+            # TODO: revert when test_v1_mtp_correctness.py is fixed
+            # VLLM_USE_MODELSCOPE=True pytest -sv tests/e2e/long_term/spec_decode_v0/e2e/test_mtp_correctness.py  # it needs a clean process
+            # pytest -sv tests/e2e/long_term/spec_decode_v0 --ignore=tests/e2e/long_term/spec_decode_v0/e2e/test_mtp_correctness.py


deepseek_mtp CI need to be skip.

Great! you can just remove the comment to enable it

EAGLE 1 Support

github-actions · 2025-07-04T10:09:18Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

umeiko force-pushed the eagle branch 2 times, most recently from bf14ebd to 2765391 Compare June 9, 2025 07:20

github-actions bot added the module:core label Jun 9, 2025

github-actions bot added the merge-conflicts label Jun 11, 2025

umeiko force-pushed the eagle branch from 2765391 to b665443 Compare June 23, 2025 03:30

github-actions bot removed the merge-conflicts label Jun 23, 2025

umeiko closed this Jun 23, 2025

umeiko force-pushed the eagle branch from b665443 to 15592c0 Compare June 23, 2025 03:39

umeiko reopened this Jun 23, 2025

github-actions bot removed the module:core label Jun 23, 2025

umeiko force-pushed the eagle branch 3 times, most recently from 8560d9d to 93d8b73 Compare June 23, 2025 07:05

github-actions bot added the module:tests label Jun 23, 2025

umeiko force-pushed the eagle branch from 93d8b73 to c8ef9cd Compare June 23, 2025 07:08

umeiko force-pushed the eagle branch 2 times, most recently from f8fd4eb to 739728c Compare June 23, 2025 07:29

umeiko force-pushed the eagle branch from 739728c to 9e10169 Compare June 23, 2025 07:36

umeiko commented Jun 23, 2025

View reviewed changes

umeiko force-pushed the eagle branch 2 times, most recently from 9e10169 to c3f1605 Compare June 25, 2025 08:52

mengwei805 added long-term-test enable long term test for PR ready-for-test start test by label for PR and removed long-term-test enable long term test for PR ready-for-test start test by label for PR labels Jun 26, 2025

umeiko force-pushed the eagle branch 2 times, most recently from fef8884 to e255af5 Compare June 26, 2025 08:14

umeiko force-pushed the eagle branch from e255af5 to 3bc781e Compare June 27, 2025 01:35

umeiko commented Jun 27, 2025

View reviewed changes

umeiko force-pushed the eagle branch from 3bc781e to 9906ecf Compare June 30, 2025 01:49

Yikun added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Jun 30, 2025

umeiko force-pushed the eagle branch 3 times, most recently from a165985 to 41d8b17 Compare July 2, 2025 06:19

Signed-off-by: umeiko <umeko@stu.xmu.edu.cn>

1296606

EAGLE 1 Support

umeiko force-pushed the eagle branch from 41d8b17 to 1296606 Compare July 2, 2025 09:50

MengqingCao added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Jul 3, 2025

github-actions bot added merge-conflicts and removed ready read for review labels Jul 4, 2025

[V1][Spec-decode] First Stage support of Eagle 1 #1128

Are you sure you want to change the base?

[V1][Spec-decode] First Stage support of Eagle 1 #1128

Uh oh!

Conversation

umeiko commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Jun 11, 2025

Uh oh!

JSYRD commented Jun 23, 2025

Uh oh!

umeiko commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

umeiko commented Jun 23, 2025

Uh oh!

JSYRD commented Jun 23, 2025

Uh oh!

umeiko commented Jun 23, 2025

Uh oh!

codecov bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

umeiko commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mengwei805 commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

umeiko Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

umeiko commented Jun 24, 2025

Uh oh!

mengwei805 commented Jun 24, 2025

Uh oh!

umeiko commented Jun 26, 2025

Uh oh!

umeiko Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 4, 2025

Uh oh!

Uh oh!

umeiko commented Jun 9, 2025 •

edited

Loading

umeiko commented Jun 23, 2025 •

edited

Loading

codecov bot commented Jun 23, 2025 •

edited

Loading

umeiko commented Jun 23, 2025 •

edited

Loading

mengwei805 commented Jun 23, 2025 •

edited

Loading