Skip to content

Commit 8b0732e

Browse files
authored
Update v1_guide.md
1 parent 5c2a95f commit 8b0732e

File tree

1 file changed

+12
-3
lines changed

1 file changed

+12
-3
lines changed

docs/usage/v1_guide.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,8 @@ based on assigned priority, with FCFS as a tie-breaker), configurable via the
8383
| **Decoder-only Models** | <nobr>🚀 Optimized</nobr> |
8484
| **Encoder-Decoder Models** | <nobr>🟠 Delayed</nobr> |
8585
| **Embedding Models** | <nobr>🟢 Functional</nobr> |
86-
| **Mamba Models** | <nobr>🚧 WIP ([PR #19327](https://github.com/vllm-project/vllm/pull/19327))</nobr> |
86+
| **Mamba Models** | <nobr>🟢 Functional</nobr> |
87+
| **Hybrid Models** | <nobr>🟢 Functional</nobr> |
8788
| **Multimodal Models** | <nobr>🟢 Functional</nobr> |
8889

8990
vLLM V1 currently excludes model architectures with the `SupportsV0Only` protocol.
@@ -104,8 +105,16 @@ to enable simultaneous generation and embedding using the same engine instance i
104105

105106
#### Mamba Models
106107

107-
Models using selective state-space mechanisms instead of standard transformer attention (e.g., `MambaForCausalLM`, `JambaForCausalLM`)
108-
will be supported via [PR #19327](https://github.com/vllm-project/vllm/pull/19327).
108+
Models using selective state-space mechanisms instead of standard transformer attention are partially supported.
109+
Models that use Mamba-2 layers (e.g., `Mamba2ForCausalLM`) are supported, but models that use older Mamba-1 layers
110+
(e.g., `MambaForCausalLM`, `JambaForCausalLM`) are not yet suported. Please note that these models currently require
111+
enforcing eager mode and disabling prefix caching in V1.
112+
113+
#### Hybrid Models
114+
115+
Models that combined Mamba-2 layers with standard transformer attention layers are supported (e.g., `BambaForCausalLM`,
116+
`Zamba2ForCausalLM`, `NemotronHForCausalLM`, `FalconH1ForCausalLM` and `GraniteMoeHybridForCausalLM`). Please note that
117+
these models currently require enforcing eager mode and disabling prefix caching in V1.
109118

110119
#### Encoder-Decoder Models
111120

0 commit comments

Comments
 (0)