Skip to content

Add support for vllm model implementation #217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

SilverSoldier
Copy link

Changing vllm_spyre to use vllm's own model implementation instead of fms.

Status: Tested on llama and fails at self.attn() in LlamaAttention.

NOTE: Attention, KV Cache and Paging implementation is totally incomplete.

@SilverSoldier
Copy link
Author

@tdoublep FYI, these are the changes we made in vllm-spyre side apart from the vllm changes mentioned:

  • import fusion only if compilation config is set
  • replace unsupported ops with their supported equivalents

Copy link

github-actions bot commented Jun 6, 2025

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

@joerunde
Copy link
Collaborator

joerunde commented Jun 6, 2025

Tests fail as expected 😉

This looks pretty cool!

@maxdebayser
Copy link
Collaborator

On spyre the decoder models are currently supported using the foundation-model-stack and the continuous batching feature that is currently being implemented relies on that library. However, encoder (embedding) models are supported in v0 using the transformers library. I'm currently working on encoder model support in V1 and I think that it would make a lot of sense to use the approach of this PR instead of continuing to run transformers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants