V1 embeddings #277

maxdebayser · 2025-07-02T18:35:09Z

Description

This PR enables embedding models on vllm V1. In contrast with the V1 GPU implementation, here I added a separate model runner because for most of the embedding models there is no need for continuous batching. To avoid code repetition, I refactored the input batch and model runner classes into a class hierarchy with common base classes.

@gmarinho2 contributed a test that verifies that the returned embeddings don't change with batch size.

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

github-actions · 2025-07-02T18:35:17Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

The changes introduced by PR vllm-project/vllm#16728 to the sampler architecture were incompatible with our spyre model runner. Initially, as a stopgap solution. I copied the old sampling classes into our vllm_spyre tree just so that we can keep working on the latest changes from main. Now this commit reverts that and makes the same logits processor logic work for the spyre input batch and model runner classes. The difference with the gpu model runner is that in spyre we don't condense the batch but have a boolean mask that is used to calculate "dense" request indices. These indices must be used for the BatchUpdateBuilder because they are the right ones to slice the `logits` tensor that is passed to the Sampler. Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

…upstream (#245)" This reverts commit 962abf1. Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

yannicks1

I am through 7/10 files, need to continue later... This is a huge PR:)

tests/e2e/test_spyre_embeddings.py

tests/spyre_util.py

vllm_spyre/v1/worker/spyre_input_batch.py

vllm_spyre/v1/worker/spyre_base_input_batch.py

vllm_spyre/v1/worker/spyre_pooling_model_runner.py

yannicks1 · 2025-07-18T15:28:31Z

vllm_spyre/v1/worker/spyre_model_runner.py

        )

        return model_output


-class StaticBatchingSpyreModelRunner(SpyreModelRunner):
+class WarmupShapesMixin:


hmm, might there be another, more elegant way to distribute the method _get_padded_batch_size to both SpyrePoolingModelRunner, and StaticBatchingSpyreModelRunner ? Also, the name WarmupShapesMixin sounds a bit weird to me 😄 . Tagging our most experience SWE here:) @joerunde @tjohnson31415

Well, mixin classes are one of the standard ways to solve this problem. But maybe there are better options.

Oh, it is just that I have never seen them before:) my apologies, learning new things 😄 Naming is fine too, now that I better understand. I initially read it like "Warmup shapes mixing" rather than "Warmup shapes mix-in"...

Python doesn't have a standard way to do object composition so there are quite a few different ways to do it, each with their own drawbacks. This is a classic overview worth a read if you have a couple minutes: https://python-patterns.guide/gang-of-four/composition-over-inheritance/

My two cents is that mixins are better than filling up a utils package with a bunch of shared methods, and then once you have a bunch of mixins you can usually find a pattern to apply instead that doesn't use multiple inheritance.

thanks a lot. will read this in a quiet minute:)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

tests/e2e/test_spyre_embeddings.py

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

maxdebayser · 2025-07-22T02:51:44Z

All tests are passing now after the changes from the first round of reviews.

maxdebayser added 4 commits May 29, 2025 14:35

Solve conflicts with upstream embedding branch

7efcc95

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

refactor input batch

370ebcd

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

add spyre pooling batch

4de58bc

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

initial model runner prototype

94b6fe2

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

maxdebayser and others added 21 commits July 2, 2025 15:56

Merge branch 'main' into v1_embeddings

313a20a

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

fix linting

ce50b01

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

appease isort

cc5d996

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Merge branch 'logits_processors' into v1_embeddings

6552783

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Remove attn_type from spec as this change hasn't made it upstream yet

bff271d

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Revert "[Priority merge] NewRequestData parameter introduced in vllm …

49effcc

…upstream (#245)" This reverts commit 962abf1. Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

disable token type ids for now

b0d08d4

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

linting

9ac34b1

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

fix masking

d391a28

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

add missing arg

8f1f12c

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

fix off by one error

8d3c65b

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

finish most of the model runner refactoring

1ee5a6f

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

small fixes

8665f5f

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Merge branch 'main' into v1_embeddings

ede0080

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

add embedding tests for multiple requests

165917a

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Fix test typo and monkey patch Bert model support

3f1123a

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Merge branch 'main' into v1_embeddings

dc54b8c

fix assertion

fb98ef2

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

fix _get_token_ids

4532431

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

fix mistakes

4510bde

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

maxdebayser changed the title ~~[WIP] V1 embeddings~~ V1 embeddings Jul 15, 2025

maxdebayser marked this pull request as ready for review July 15, 2025 22:53

maxdebayser requested review from rafvasq and prashantgupta24 as code owners July 15, 2025 22:53

maxdebayser requested review from sducouedic, yannicks1, tdoublep and nikolaospapandreou as code owners July 15, 2025 22:53

maxdebayser added 2 commits July 17, 2025 10:17

Merge branch 'main' into v1_embeddings

5acb9d9

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Merge branch 'main' into v1_embeddings

c650f1e

yannicks1 reviewed Jul 18, 2025

View reviewed changes

maxdebayser added 3 commits July 20, 2025 22:48

address review comments

5085371

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Merge branch 'main' into v1_embeddings

e4a84ef

restore chili peppers

5f46a53

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

sducouedic reviewed Jul 21, 2025

View reviewed changes

maxdebayser added 7 commits July 21, 2025 10:17

fix tests

c047171

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

fix missing torch_sendnn initialization

2e8c25c

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

support upstream changes

9a055c3

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

revert edit mistake

5018399

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

appease mypy

c8e2db7

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

work around upstream changes

204241c

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

compatibility with vllm 0.9.3

f04dd49

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

maxdebayser enabled auto-merge (squash) July 22, 2025 02:50

github-actions bot added the ready label Jul 22, 2025

V1 embeddings #277

Are you sure you want to change the base?

V1 embeddings #277

Uh oh!

Conversation

maxdebayser commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

github-actions bot commented Jul 2, 2025

Uh oh!

yannicks1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yannicks1 Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

maxdebayser Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

yannicks1 Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

joerunde Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

yannicks1 Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maxdebayser commented Jul 22, 2025

Uh oh!

Uh oh!

maxdebayser commented Jul 2, 2025 •

edited

Loading