Skip to content

V1 embeddings #277

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open

V1 embeddings #277

wants to merge 37 commits into from

Conversation

maxdebayser
Copy link
Collaborator

@maxdebayser maxdebayser commented Jul 2, 2025

Description

This PR enables embedding models on vllm V1. In contrast with the V1 GPU implementation, here I added a separate model runner because for most of the embedding models there is no need for continuous batching. To avoid code repetition, I refactored the input batch and model runner classes into a class hierarchy with common base classes.

@gmarinho2 contributed a test that verifies that the returned embeddings don't change with batch size.

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Copy link

github-actions bot commented Jul 2, 2025

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

maxdebayser and others added 21 commits July 2, 2025 15:56
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
The changes introduced by PR
vllm-project/vllm#16728
to the sampler architecture were incompatible with
our spyre model runner.

Initially, as a stopgap solution. I copied the old sampling
classes into our vllm_spyre tree just so that we can keep
working on the latest changes from main.

Now this commit reverts that and makes the same logits
processor logic work for the spyre input batch and model
runner classes.  The difference with the gpu model runner is
that in spyre we don't condense the batch but have a boolean
mask that is used to calculate "dense" request indices. These
indices must be used for the BatchUpdateBuilder because they
are the right ones to slice the `logits` tensor that is passed
to the Sampler.

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
…upstream (#245)"

This reverts commit 962abf1.

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@maxdebayser maxdebayser changed the title [WIP] V1 embeddings V1 embeddings Jul 15, 2025
@maxdebayser maxdebayser marked this pull request as ready for review July 15, 2025 22:53
Copy link
Collaborator

@yannicks1 yannicks1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am through 7/10 files, need to continue later... This is a huge PR:)

)

return model_output


class StaticBatchingSpyreModelRunner(SpyreModelRunner):
class WarmupShapesMixin:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, might there be another, more elegant way to distribute the method _get_padded_batch_size to both SpyrePoolingModelRunner, and StaticBatchingSpyreModelRunner ? Also, the name WarmupShapesMixin sounds a bit weird to me 😄 . Tagging our most experience SWE here:) @joerunde @tjohnson31415

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, mixin classes are one of the standard ways to solve this problem. But maybe there are better options.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it is just that I have never seen them before:) my apologies, learning new things 😄 Naming is fine too, now that I better understand. I initially read it like "Warmup shapes mixing" rather than "Warmup shapes mix-in"...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python doesn't have a standard way to do object composition so there are quite a few different ways to do it, each with their own drawbacks. This is a classic overview worth a read if you have a couple minutes: https://python-patterns.guide/gang-of-four/composition-over-inheritance/

My two cents is that mixins are better than filling up a utils package with a bunch of shared methods, and then once you have a bunch of mixins you can usually find a pattern to apply instead that doesn't use multiple inheritance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot. will read this in a quiet minute:)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@maxdebayser maxdebayser enabled auto-merge (squash) July 22, 2025 02:50
@maxdebayser
Copy link
Collaborator Author

All tests are passing now after the changes from the first round of reviews.

@github-actions github-actions bot added the ready label Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants