Skip to content

Commit 6ef7b35

Browse files
committed
mention VLMs in the docs
1 parent 9aec5ac commit 6ef7b35

File tree

1 file changed

+7
-4
lines changed

1 file changed

+7
-4
lines changed

docs/models/supported_models.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ These models are what we list in [supported-text-models][supported-text-models]
2121

2222
### Transformers
2323

24-
vLLM also supports model implementations that are available in Transformers. This does not currently work for all models, but most decoder language models are supported, and vision language model support is planned!
24+
vLLM also supports model implementations that are available in Transformers. This does not currently work for all models, but most decoder language models and common vision language models are supported!
2525

2626
To check if the modeling backend is Transformers, you can simply do this:
2727

@@ -31,14 +31,17 @@ llm = LLM(model=..., task="generate") # Name or path of your model
3131
llm.apply_model(lambda model: print(type(model)))
3232
```
3333

34-
If it is `TransformersForCausalLM` then it means it's based on Transformers!
34+
If it is `TransformersForCausalLM` or `TransformersForMultimodalLM` then it means it's based on Transformers!
3535

3636
!!! tip
37-
You can force the use of `TransformersForCausalLM` by setting `model_impl="transformers"` for [offline-inference][offline-inference] or `--model-impl transformers` for the [openai-compatible-server][openai-compatible-server].
37+
You can force the use of `Transformers` model by setting `model_impl="transformers"` for [offline-inference][offline-inference] or `--model-impl transformers` for the [openai-compatible-server][openai-compatible-server].
3838

3939
!!! note
4040
vLLM may not fully optimise the Transformers implementation so you may see degraded performance if comparing a native model to a Transformers model in vLLM.
4141

42+
!!! note
43+
In case of vision language models if you are loading with `dtype="auto"`, vLLM loads the whole model with config's `dtype` if it exists. In contrast the native Trasnformers will respect the `dtype` attribute of each backbone in the model. That might cause a slight difference in performance.
44+
4245
#### Custom models
4346

4447
If a model is neither supported natively by vLLM or Transformers, it can still be used in vLLM!
@@ -97,7 +100,7 @@ Here is what happens in the background when this model is loaded:
97100

98101
1. The config is loaded.
99102
2. `MyModel` Python class is loaded from the `auto_map` in config, and we check that the model `is_backend_compatible()`.
100-
3. `MyModel` is loaded into `TransformersForCausalLM` (see <gh-file:vllm/model_executor/models/transformers.py>) which sets `self.config._attn_implementation = "vllm"` so that vLLM's attention layer is used.
103+
3. `MyModel` is loaded into `TransformersForCausalLM` or `TransformersForMultimodalLM` (see <gh-file:vllm/model_executor/models/transformers.py>) which sets `self.config._attn_implementation = "vllm"` so that vLLM's attention layer is used.
101104

102105
That's it!
103106

0 commit comments

Comments
 (0)