Skip to content

Support / guidance on training an LSTM-based speculator for Qwen2.5-VL-32B #255

@lahmuller

Description

@lahmuller

Hi teams, thanks for this amazing project! I have successfully trained a lightweight LSTM speculator for the text-only Qwen2.5 models and integrated it with Arctic Inference. And the speedup it acheived is so great!

Now I’d like to achieve similar acceleration for Qwen2.5-VL-32B. After briefly scanning the VLM, I see that:
The vision encoder outputs a sequence of image tokens that are concatenated with text tokens before feeding the LLM backbone.
The existing speculative-decoding utilities seem to assume a pure text input.

Questions

  1. Is it technically feasible to extend the current LSTM speculative decoder to handle the interleaved “image + text” token stream of Qwen2.5-VL-32B?
  2. If yes, how can I train this LSTM speculator?

If the above is not recommended, would the maintainers suggest:

  1. Using a small Qwen2.5-VL model (e.g., 3B) as the draft model instead, or any other solutions?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions