Replies: 1 comment
-
Well, I finally found something that seems to allow this (though I haven’t tried it yet)! I’m sharing it here in case others are looking for the same thing. https://docs.vllm.ai/en/latest/serving/multimodal_inputs.html#embedding-inputs |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello!
I was reading the DocVLM paper, which describes a method to improve the performance of VLM-type models for document analysis without modifying the model weights. The trick is to add an additional module that inserts tokens into the sequence decoded by the decoder.
Below is an image showing how it works.

Since the approach is based on VLMs already integrated into vLLM (for example, Qwen2VL), one might intuitively think that only a few adjustments are needed for a "DocVLMQwen2" version to work within vLLM.
I’d be happy to get your feedback on the best way to make this work!
I’ve already read Adding a New Model and vLLM’s Plugin System but I don’t know which approach is best. Any feedback or shared experience would be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions