What is the best way to make DocVLM work with vLLM? #15028

Soum-Soum · 2025-03-18T11:18:53Z

Soum-Soum
Mar 18, 2025

Hello!

I was reading the DocVLM paper, which describes a method to improve the performance of VLM-type models for document analysis without modifying the model weights. The trick is to add an additional module that inserts tokens into the sequence decoded by the decoder.

Below is an image showing how it works.

Since the approach is based on VLMs already integrated into vLLM (for example, Qwen2VL), one might intuitively think that only a few adjustments are needed for a "DocVLMQwen2" version to work within vLLM.

I’d be happy to get your feedback on the best way to make this work!
I’ve already read Adding a New Model and vLLM’s Plugin System but I don’t know which approach is best. Any feedback or shared experience would be greatly appreciated!

Soum-Soum · 2025-03-18T12:10:30Z

Soum-Soum
Mar 18, 2025
Author

Well, I finally found something that seems to allow this (though I haven’t tried it yet)! I’m sharing it here in case others are looking for the same thing.

https://docs.vllm.ai/en/latest/serving/multimodal_inputs.html#embedding-inputs

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

What is the best way to make DocVLM work with vLLM? #15028

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

What is the best way to make DocVLM work with vLLM? #15028

Uh oh!

Soum-Soum Mar 18, 2025

Replies: 1 comment

Uh oh!

Soum-Soum Mar 18, 2025 Author

Soum-Soum
Mar 18, 2025

Soum-Soum
Mar 18, 2025
Author