Add support for vllm model implementation #217

SilverSoldier · 2025-06-06T09:33:20Z

Changing vllm_spyre to use vllm's own model implementation instead of fms.

Status: Tested on llama and fails at self.attn() in LlamaAttention.

NOTE: Attention, KV Cache and Paging implementation is totally incomplete.

SilverSoldier · 2025-06-06T09:35:15Z

@tdoublep FYI, these are the changes we made in vllm-spyre side apart from the vllm changes mentioned:

import fusion only if compilation config is set
replace unsupported ops with their supported equivalents

github-actions · 2025-06-06T09:36:24Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

joerunde · 2025-06-06T15:55:46Z

Tests fail as expected 😉

This looks pretty cool!

maxdebayser · 2025-06-10T17:28:44Z

On spyre the decoder models are currently supported using the foundation-model-stack and the continuous batching feature that is currently being implemented relies on that library. However, encoder (embedding) models are supported in v0 using the transformers library. I'm currently working on encoder model support in V1 and I think that it would make a lot of sense to use the approach of this PR instead of continuing to run transformers.

SilverSoldier requested review from yannicks1, tdoublep, nikolaospapandreou and sducouedic as code owners June 6, 2025 09:33

SilverSoldier marked this pull request as draft June 6, 2025 09:33

SilverSoldier added 4 commits July 16, 2025 15:16

Add vllm model support

7d5da45

Merge branch 'main' into vllm-model

c37f6e2

works on inductor backend

cd5fd3d

Avoid rewriting entire load model

5816d19

SilverSoldier force-pushed the vllm-model branch from d7f97b2 to 5816d19 Compare July 23, 2025 12:44

Add attention backend

879b4fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for vllm model implementation #217

Add support for vllm model implementation #217

Uh oh!

SilverSoldier commented Jun 6, 2025

Uh oh!

SilverSoldier commented Jun 6, 2025

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

joerunde commented Jun 6, 2025

Uh oh!

maxdebayser commented Jun 10, 2025

Uh oh!

Uh oh!

Add support for vllm model implementation #217

Are you sure you want to change the base?

Add support for vllm model implementation #217

Uh oh!

Conversation

SilverSoldier commented Jun 6, 2025

Uh oh!

SilverSoldier commented Jun 6, 2025

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

joerunde commented Jun 6, 2025

Uh oh!

maxdebayser commented Jun 10, 2025

Uh oh!

Uh oh!