You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/models/supported_models.md
+7-4Lines changed: 7 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@ These models are what we list in [supported-text-models][supported-text-models]
21
21
22
22
### Transformers
23
23
24
-
vLLM also supports model implementations that are available in Transformers. This does not currently work for all models, but most decoder language models are supported, and vision language model support is planned!
24
+
vLLM also supports model implementations that are available in Transformers. This does not currently work for all models, but most decoder language models and common vision language models are supported!
25
25
26
26
To check if the modeling backend is Transformers, you can simply do this:
27
27
@@ -31,14 +31,17 @@ llm = LLM(model=..., task="generate") # Name or path of your model
31
31
llm.apply_model(lambdamodel: print(type(model)))
32
32
```
33
33
34
-
If it is `TransformersForCausalLM` then it means it's based on Transformers!
34
+
If it is `TransformersForCausalLM`or `TransformersForMultimodalLM`then it means it's based on Transformers!
35
35
36
36
!!! tip
37
-
You can force the use of `TransformersForCausalLM` by setting `model_impl="transformers"` for [offline-inference][offline-inference] or `--model-impl transformers` for the [openai-compatible-server][openai-compatible-server].
37
+
You can force the use of `Transformers` model by setting `model_impl="transformers"` for [offline-inference][offline-inference] or `--model-impl transformers` for the [openai-compatible-server][openai-compatible-server].
38
38
39
39
!!! note
40
40
vLLM may not fully optimise the Transformers implementation so you may see degraded performance if comparing a native model to a Transformers model in vLLM.
41
41
42
+
!!! note
43
+
In case of vision language models if you are loading with `dtype="auto"`, vLLM loads the whole model with config's `dtype` if it exists. In contrast the native Trasnformers will respect the `dtype` attribute of each backbone in the model. That might cause a slight difference in performance.
44
+
42
45
#### Custom models
43
46
44
47
If a model is neither supported natively by vLLM or Transformers, it can still be used in vLLM!
@@ -97,7 +100,7 @@ Here is what happens in the background when this model is loaded:
97
100
98
101
1. The config is loaded.
99
102
2.`MyModel` Python class is loaded from the `auto_map` in config, and we check that the model `is_backend_compatible()`.
100
-
3.`MyModel` is loaded into `TransformersForCausalLM` (see <gh-file:vllm/model_executor/models/transformers.py>) which sets `self.config._attn_implementation = "vllm"` so that vLLM's attention layer is used.
103
+
3.`MyModel` is loaded into `TransformersForCausalLM`or `TransformersForMultimodalLM`(see <gh-file:vllm/model_executor/models/transformers.py>) which sets `self.config._attn_implementation = "vllm"` so that vLLM's attention layer is used.
0 commit comments