You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/features/multimodal_inputs.md
+43Lines changed: 43 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -101,6 +101,49 @@ To substitute multiple images inside the same text prompt, you can pass in a lis
101
101
102
102
Full example: <gh-file:examples/offline_inference/vision_language_multi_image.py>
103
103
104
+
If using the [LLM.chat](https://docs.vllm.ai/en/stable/models/generative_models.html#llmchat) method, you can pass images directly in the message content using various formats: image URLs, PIL Image objects, or pre-computed embeddings:
{"role": "system", "content": "You are a helpful assistant"},
117
+
{"role": "user", "content": "Hello"},
118
+
{"role": "assistant", "content": "Hello! How can I assist you today?"},
119
+
{
120
+
"role": "user",
121
+
"content": [{
122
+
"type": "image_url",
123
+
"image_url": {
124
+
"url": image_url
125
+
}
126
+
},{
127
+
"type": "image_pil",
128
+
"image_pil": image_pil
129
+
}, {
130
+
"type": "image_embeds",
131
+
"image_embeds": image_embeds
132
+
}, {
133
+
"type": "text",
134
+
"text": "What's in these images?"
135
+
}],
136
+
},
137
+
]
138
+
139
+
# Perform inference and log output.
140
+
outputs = llm.chat(conversation)
141
+
142
+
for o in outputs:
143
+
generated_text = o.outputs[0].text
144
+
print(generated_text)
145
+
```
146
+
104
147
Multi-image input can be extended to perform video captioning. We show this with [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) as it supports videos:
0 commit comments