-
Notifications
You must be signed in to change notification settings - Fork 257
[V1][Spec-decode] First Stage support of Eagle 1 #1128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
bf14ebd
to
2765391
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Hello, also found eagle is still not working with v0.9.0rc2 ( looks correctly loaded but will crash), however there's a patch in |
|
8560d9d
to
93d8b73
Compare
I updated this commit after eagle3 pr is ready. This brach has been tested on v0.9.1 locally. |
Got that. But as the same question mentioned above, it seems there's already an implementation of eagle1, and is it waited to be fixed? or your version just implemented another in a different way? Thanks for your reply~ |
f8fd4eb
to
739728c
Compare
yes. eagle 1 support have not been finished in v1 kernel before. this pr fixes this. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1128 +/- ##
===========================================
+ Coverage 27.39% 49.92% +22.53%
===========================================
Files 56 76 +20
Lines 6191 9231 +3040
===========================================
+ Hits 1696 4609 +2913
- Misses 4495 4622 +127
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Serving Scripts: MODEL=/mnt/models/Meta-Llama-3.1-8B-Instruct
EAGEL1DRAFT=/mnt/models/EAGLE-LLaMA3.1-Instruct-8B
DRAFT=$EAGEL1DRAFT
VLLM_USE_V1=1 python -m vllm.entrypoints.openai.api_server\
--host 0.0.0.0 --port 8001 --model $MODEL --enforce_eager \
--seed 42 -tp 1 --gpu_memory_utilization 0.8 --max_num_seqs 1 --max_model_len 1024\
--speculative_config '{ "method": "eagle", "model": "'$DRAFT'", "draft_tensor_parallel_size": 1, "num_speculative_tokens": 2, "max_model_len": 1024 }' Serving Benchmark On Dataset
|
@@ -117,8 +117,6 @@ def test_eagle_correctness( | |||
Compare the outputs of a original LLM and a speculative LLM | |||
should be the same when using eagle speculative decoding. | |||
''' | |||
if not use_eagle3: | |||
pytest.skip("Not current support for the test.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests for eagle1 was opened here
9e10169
to
c3f1605
Compare
fef8884
to
e255af5
Compare
longterm CI currently not working : #1444 |
pytest -sv tests/e2e/long_term/spec_decode_v0 --ignore=tests/e2e/long_term/spec_decode_v0/e2e/test_mtp_correctness.py | ||
# TODO: revert when test_v1_mtp_correctness.py is fixed | ||
# VLLM_USE_MODELSCOPE=True pytest -sv tests/e2e/long_term/spec_decode_v0/e2e/test_mtp_correctness.py # it needs a clean process | ||
# pytest -sv tests/e2e/long_term/spec_decode_v0 --ignore=tests/e2e/long_term/spec_decode_v0/e2e/test_mtp_correctness.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deepseek_mtp CI need to be skip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! you can just remove the comment to enable it
a165985
to
41d8b17
Compare
EAGLE 1 Support
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?