@@ -319,7 +319,7 @@ Specified using `--task generate`.
319
319
| ` AquilaForCausalLM ` | Aquila, Aquila2 | ` BAAI/Aquila-7B ` , ` BAAI/AquilaChat-7B ` , etc. | ✅︎ | ✅︎ | ✅︎ |
320
320
| ` ArcticForCausalLM ` | Arctic | ` Snowflake/snowflake-arctic-base ` , ` Snowflake/snowflake-arctic-instruct ` , etc. | | ✅︎ | ✅︎ |
321
321
| ` BaiChuanForCausalLM ` | Baichuan2, Baichuan | ` baichuan-inc/Baichuan2-13B-Chat ` , ` baichuan-inc/Baichuan-7B ` , etc. | ✅︎ | ✅︎ | ✅︎ |
322
- | ` BambaForCausalLM ` | Bamba | ` ibm-ai-platform/Bamba-9B-fp8 ` , ` ibm-ai-platform/Bamba-9B ` | ✅︎ | ✅︎ | |
322
+ | ` BambaForCausalLM ` | Bamba | ` ibm-ai-platform/Bamba-9B-fp8 ` , ` ibm-ai-platform/Bamba-9B ` | ✅︎ | ✅︎ | ✅︎ |
323
323
| ` BloomForCausalLM ` | BLOOM, BLOOMZ, BLOOMChat | ` bigscience/bloom ` , ` bigscience/bloomz ` , etc. | | ✅︎ | |
324
324
| ` BartForConditionalGeneration ` | BART | ` facebook/bart-base ` , ` facebook/bart-large-cnn ` , etc. | | | |
325
325
| ` ChatGLMModel ` , ` ChatGLMForConditionalGeneration ` | ChatGLM | ` THUDM/chatglm2-6b ` , ` THUDM/chatglm3-6b ` , ` ShieldLM-6B-chatglm3 ` , etc. | ✅︎ | ✅︎ | ✅︎ |
@@ -335,7 +335,7 @@ Specified using `--task generate`.
335
335
| ` ExaoneForCausalLM ` | EXAONE-3 | ` LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct ` , etc. | ✅︎ | ✅︎ | ✅︎ |
336
336
| ` FalconForCausalLM ` | Falcon | ` tiiuae/falcon-7b ` , ` tiiuae/falcon-40b ` , ` tiiuae/falcon-rw-7b ` , etc. | | ✅︎ | ✅︎ |
337
337
| ` FalconMambaForCausalLM ` | FalconMamba | ` tiiuae/falcon-mamba-7b ` , ` tiiuae/falcon-mamba-7b-instruct ` , etc. | | ✅︎ | ✅︎ |
338
- | ` FalconH1ForCausalLM ` | Falcon-H1 | ` tiiuae/Falcon-H1-34B-Base ` , ` tiiuae/Falcon-H1-34B-Instruct ` , etc. | ✅︎ | ✅︎ | |
338
+ | ` FalconH1ForCausalLM ` | Falcon-H1 | ` tiiuae/Falcon-H1-34B-Base ` , ` tiiuae/Falcon-H1-34B-Instruct ` , etc. | ✅︎ | ✅︎ | ✅︎ |
339
339
| ` GemmaForCausalLM ` | Gemma | ` google/gemma-2b ` , ` google/gemma-1.1-2b-it ` , etc. | ✅︎ | ✅︎ | ✅︎ |
340
340
| ` Gemma2ForCausalLM ` | Gemma 2 | ` google/gemma-2-9b ` , ` google/gemma-2-27b ` , etc. | ✅︎ | ✅︎ | ✅︎ |
341
341
| ` Gemma3ForCausalLM ` | Gemma 3 | ` google/gemma-3-1b-it ` , etc. | ✅︎ | ✅︎ | ✅︎ |
@@ -348,7 +348,7 @@ Specified using `--task generate`.
348
348
| ` GPTNeoXForCausalLM ` | GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM | ` EleutherAI/gpt-neox-20b ` , ` EleutherAI/pythia-12b ` , ` OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 ` , ` databricks/dolly-v2-12b ` , ` stabilityai/stablelm-tuned-alpha-7b ` , etc. | | ✅︎ | ✅︎ |
349
349
| ` GraniteForCausalLM ` | Granite 3.0, Granite 3.1, PowerLM | ` ibm-granite/granite-3.0-2b-base ` , ` ibm-granite/granite-3.1-8b-instruct ` , ` ibm/PowerLM-3b ` , etc. | ✅︎ | ✅︎ | ✅︎ |
350
350
| ` GraniteMoeForCausalLM ` | Granite 3.0 MoE, PowerMoE | ` ibm-granite/granite-3.0-1b-a400m-base ` , ` ibm-granite/granite-3.0-3b-a800m-instruct ` , ` ibm/PowerMoE-3b ` , etc. | ✅︎ | ✅︎ | ✅︎ |
351
- | ` GraniteMoeHybridForCausalLM ` | Granite 4.0 MoE Hybrid | ` ibm-granite/granite-4.0-tiny-preview ` , etc. | ✅︎ | ✅︎ | |
351
+ | ` GraniteMoeHybridForCausalLM ` | Granite 4.0 MoE Hybrid | ` ibm-granite/granite-4.0-tiny-preview ` , etc. | ✅︎ | ✅︎ | ✅︎ |
352
352
| ` GraniteMoeSharedForCausalLM ` | Granite MoE Shared | ` ibm-research/moe-7b-1b-active-shared-experts ` (test model) | ✅︎ | ✅︎ | ✅︎ |
353
353
| ` GritLM ` | GritLM | ` parasail-ai/GritLM-7B-vllm ` . | ✅︎ | ✅︎ | |
354
354
| ` Grok1ModelForCausalLM ` | Grok1 | ` hpcai-tech/grok-1 ` . | ✅︎ | ✅︎ | ✅︎ |
@@ -367,7 +367,7 @@ Specified using `--task generate`.
367
367
| ` MixtralForCausalLM ` | Mixtral-8x7B, Mixtral-8x7B-Instruct | ` mistralai/Mixtral-8x7B-v0.1 ` , ` mistralai/Mixtral-8x7B-Instruct-v0.1 ` , ` mistral-community/Mixtral-8x22B-v0.1 ` , etc. | ✅︎ | ✅︎ | ✅︎ |
368
368
| ` MPTForCausalLM ` | MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter | ` mosaicml/mpt-7b ` , ` mosaicml/mpt-7b-storywriter ` , ` mosaicml/mpt-30b ` , etc. | | ✅︎ | ✅︎ |
369
369
| ` NemotronForCausalLM ` | Nemotron-3, Nemotron-4, Minitron | ` nvidia/Minitron-8B-Base ` , ` mgoin/Nemotron-4-340B-Base-hf-FP8 ` , etc. | ✅︎ | ✅︎ | ✅︎ |
370
- | ` NemotronHForCausalLM ` | Nemotron-H | ` nvidia/Nemotron-H-8B-Base-8K ` , ` nvidia/Nemotron-H-47B-Base-8K ` , ` nvidia/Nemotron-H-56B-Base-8K ` , etc. | ✅︎ | ✅︎ | |
370
+ | ` NemotronHForCausalLM ` | Nemotron-H | ` nvidia/Nemotron-H-8B-Base-8K ` , ` nvidia/Nemotron-H-47B-Base-8K ` , ` nvidia/Nemotron-H-56B-Base-8K ` , etc. | ✅︎ | ✅︎ | ✅︎ |
371
371
| ` OLMoForCausalLM ` | OLMo | ` allenai/OLMo-1B-hf ` , ` allenai/OLMo-7B-hf ` , etc. | | ✅︎ | ✅︎ |
372
372
| ` OLMo2ForCausalLM ` | OLMo2 | ` allenai/OLMo-2-0425-1B ` , etc. | | ✅︎ | ✅︎ |
373
373
| ` OLMoEForCausalLM ` | OLMoE | ` allenai/OLMoE-1B-7B-0924 ` , ` allenai/OLMoE-1B-7B-0924-Instruct ` , etc. | | ✅︎ | ✅︎ |
@@ -392,7 +392,7 @@ Specified using `--task generate`.
392
392
| ` XverseForCausalLM ` | XVERSE | ` xverse/XVERSE-7B-Chat ` , ` xverse/XVERSE-13B-Chat ` , ` xverse/XVERSE-65B-Chat ` , etc. | ✅︎ | ✅︎ | ✅︎ |
393
393
| ` MiniMaxM1ForCausalLM ` | MiniMax-Text | ` MiniMaxAI/MiniMax-M1-40k ` , ` MiniMaxAI/MiniMax-M1-80k ` etc. | | | |
394
394
| ` MiniMaxText01ForCausalLM ` | MiniMax-Text | ` MiniMaxAI/MiniMax-Text-01 ` , etc. | | | |
395
- | ` Zamba2ForCausalLM ` | Zamba2 | ` Zyphra/Zamba2-7B-instruct ` , ` Zyphra/Zamba2-2.7B-instruct ` , ` Zyphra/Zamba2-1.2B-instruct ` , etc. | | | |
395
+ | ` Zamba2ForCausalLM ` | Zamba2 | ` Zyphra/Zamba2-7B-instruct ` , ` Zyphra/Zamba2-2.7B-instruct ` , ` Zyphra/Zamba2-1.2B-instruct ` , etc. | | | ✅︎ |
396
396
397
397
!!! note
398
398
Currently, the ROCm version of vLLM supports Mistral and Mixtral only for context lengths up to 4096.
0 commit comments