You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/models/supported_models.md
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -57,10 +57,10 @@ llm = LLM(model=..., task="generate") # Name or path of your model
57
57
llm.apply_model(lambdamodel: print(type(model)))
58
58
```
59
59
60
-
If it is `TransformersModel` then it means it's based on Transformers!
60
+
If it is `TransformersForCausalLM` then it means it's based on Transformers!
61
61
62
62
:::{tip}
63
-
You can force the use of `TransformersModel` by setting `model_impl="transformers"` for <project:#offline-inference> or `--model-impl transformers` for the <project:#openai-compatible-server>.
63
+
You can force the use of `TransformersForCausalLM` by setting `model_impl="transformers"` for <project:#offline-inference> or `--model-impl transformers` for the <project:#openai-compatible-server>.
64
64
:::
65
65
66
66
:::{note}
@@ -119,7 +119,7 @@ Here is what happens in the background:
119
119
120
120
1. The config is loaded
121
121
2.`MyModel` Python class is loaded from the `auto_map`, and we check that the model `_supports_attention_backend`.
122
-
3. The `TransformersModel` backend is used. See <gh-file:vllm/model_executor/models/transformers.py>, which leverage `self.config._attn_implementation = "vllm"`, thus the need to use `ALL_ATTENTION_FUNCTION`.
122
+
3. The `TransformersForCausalLM` backend is used. See <gh-file:vllm/model_executor/models/transformers.py>, which leverage `self.config._attn_implementation = "vllm"`, thus the need to use `ALL_ATTENTION_FUNCTION`.
123
123
124
124
To make your model compatible with tensor parallel, it needs:
0 commit comments