Skip to content

speculative decoding and mtp optimization #1435

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 30 commits into
base: deepseek_r1
Choose a base branch
from

Conversation

inkcherry
Copy link

@inkcherry inkcherry commented Jun 16, 2025

from
image

to

image

cc @YuJiankang @czhu15

benchislett and others added 29 commits April 24, 2025 08:51
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
This PR enabled a profiler for each step of the LLMEngine. The following
4 ENVs are used to control the profiler:

- VLLM_ENGINE_PROFILER_ENABLED, set to true to enable device profiler.
- VLLM_ENGINE_PROFILER_WARMUP_STEPS, number of steps to ignore for
profiling.
- VLLM_ENGINE_PROFILER_STEPS, number of steps to capture for profiling.
- VLLM_ENGINE_PROFILER_REPEAT, number of cycles for (warmup + profile).

> Please refer to
[torch.profiler.schedule](https://pytorch.org/docs/stable/profiler.html#torch.profiler.schedule)
for more details about the profiler schedule arguments.

> The step in profiling means a step of the LLM engine and **exclude**
the profile and warmup run in HabanaModelRunner.

Please use `export VLLM_PROFILER_ENABLED=True` to enable the high-level
vLLM profiler and use the result to choose the steps for detailed
LLMEngine profiling.
Signed-off-by: jkyu <jiankang.yu@intel.com>
Signed-off-by: jkyu <jiankang.yu@intel.com>
Signed-off-by: jkyu <jiankang.yu@intel.com>
Signed-off-by: jkyu <jiankang.yu@intel.com>
Signed-off-by: jkyu <jiankang.yu@intel.com>
@inkcherry inkcherry changed the title speculative decoding and mpt optimization speculative decoding and mtp optimization Jun 18, 2025
reduce the communication to optimize the perf

Signed-off-by: inkcherry <mingzhi.liu@intel.com>

Signed-off-by: inkcherry <mingzhi.liu@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants