Support / guidance on training an LSTM-based speculator for Qwen2.5-VL-32B

Hi teams, thanks for this amazing project! I have successfully trained a lightweight LSTM speculator for the **text-only** Qwen2.5 models and integrated it with Arctic Inference. And the speedup it acheived is so great!


Now I’d like to achieve similar acceleration for Qwen2.5-VL-32B. After briefly scanning the VLM, I see that:
The vision encoder outputs a sequence of image tokens that are concatenated with text tokens before feeding the LLM backbone.
The existing speculative-decoding utilities seem to assume a pure text input.


Questions

1. Is it technically feasible to extend the current LSTM speculative decoder to handle the interleaved “image + text” token stream of Qwen2.5-VL-32B?
2. If yes, how can I train this LSTM speculator?

If the above is not recommended, would the maintainers suggest:

1. Using a small Qwen2.5-VL model (e.g., 3B) as the draft model instead, or any other solutions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support / guidance on training an LSTM-based speculator for Qwen2.5-VL-32B #255

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support / guidance on training an LSTM-based speculator for Qwen2.5-VL-32B #255

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions