|
2 | 2 |
|
3 | 3 | Scale hosts the following models in the LLM Engine Model Zoo:
|
4 | 4 |
|
5 |
| -| Model Name | Inference APIs Available | Fine-tuning APIs Available | Inference Frameworks Available | Inference max total tokens (prompt + response) | |
6 |
| -| --------------------- | ------------------------ | -------------------------- | ------------------------------ | ------------------------------ | |
7 |
| -| `llama-7b` | ✅ | ✅ | deepspeed, text-generation-inference | 2048 | |
8 |
| -| `llama-2-7b` | ✅ | ✅ | text-generation-inference, vllm | 4096| |
9 |
| -| `llama-2-7b-chat` | ✅ | | text-generation-inference, vllm | 4096| |
10 |
| -| `llama-2-13b` | ✅ | | text-generation-inference, vllm | 4096| |
11 |
| -| `llama-2-13b-chat` | ✅ | | text-generation-inference, vllm | 4096| |
12 |
| -| `llama-2-70b` | ✅ | ✅ | text-generation-inference, vllm | 4096| |
13 |
| -| `llama-2-70b-chat` | ✅ | | text-generation-inference, vllm | 4096| |
14 |
| -| `falcon-7b` | ✅ | | text-generation-inference, vllm | 2048 | |
15 |
| -| `falcon-7b-instruct` | ✅ | | text-generation-inference, vllm | 2048 | |
16 |
| -| `falcon-40b` | ✅ | | text-generation-inference, vllm | 2048 | |
17 |
| -| `falcon-40b-instruct` | ✅ | | text-generation-inference, vllm | 2048 | |
18 |
| -| `mpt-7b` | ✅ | | deepspeed, text-generation-inference, vllm | 2048 | |
19 |
| -| `mpt-7b-instruct` | ✅ | ✅ | deepspeed, text-generation-inference, vllm | 2048 | |
20 |
| -| `flan-t5-xxl` | ✅ | | deepspeed, text-generation-inference | 2048 | |
21 |
| -| `mistral-7b` | ✅ | ✅ | vllm | 8000 | |
22 |
| -| `mistral-7b-instruct` | ✅ | ✅ | vllm | 8000 | |
23 |
| -| `codellama-7b` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
24 |
| -| `codellama-7b-instruct` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
25 |
| -| `codellama-13b` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
26 |
| -| `codellama-13b-instruct` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
27 |
| -| `codellama-34b` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
28 |
| -| `codellama-34b-instruct` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
29 |
| -| `zephyr-7b-alpha` | ✅ | | text-generation-inference, vllm | 32768 | |
30 |
| -| `zephyr-7b-beta` | ✅ | | text-generation-inference, vllm | 32768 | |
| 5 | +| Model Name | Inference APIs Available | Fine-tuning APIs Available | Inference Frameworks Available | Inference max total tokens (prompt + response) | |
| 6 | +| ------------------------ | ------------------------ | -------------------------- | ------------------------------------------ | ---------------------------------------------- | |
| 7 | +| `llama-7b` | ✅ | ✅ | deepspeed, text-generation-inference | 2048 | |
| 8 | +| `llama-2-7b` | ✅ | ✅ | text-generation-inference, vllm | 4096 | |
| 9 | +| `llama-2-7b-chat` | ✅ | | text-generation-inference, vllm | 4096 | |
| 10 | +| `llama-2-13b` | ✅ | | text-generation-inference, vllm | 4096 | |
| 11 | +| `llama-2-13b-chat` | ✅ | | text-generation-inference, vllm | 4096 | |
| 12 | +| `llama-2-70b` | ✅ | ✅ | text-generation-inference, vllm | 4096 | |
| 13 | +| `llama-2-70b-chat` | ✅ | | text-generation-inference, vllm | 4096 | |
| 14 | +| `falcon-7b` | ✅ | | text-generation-inference, vllm | 2048 | |
| 15 | +| `falcon-7b-instruct` | ✅ | | text-generation-inference, vllm | 2048 | |
| 16 | +| `falcon-40b` | ✅ | | text-generation-inference, vllm | 2048 | |
| 17 | +| `falcon-40b-instruct` | ✅ | | text-generation-inference, vllm | 2048 | |
| 18 | +| `mpt-7b` | ✅ | | deepspeed, text-generation-inference, vllm | 2048 | |
| 19 | +| `mpt-7b-instruct` | ✅ | ✅ | deepspeed, text-generation-inference, vllm | 2048 | |
| 20 | +| `flan-t5-xxl` | ✅ | | deepspeed, text-generation-inference | 2048 | |
| 21 | +| `mistral-7b` | ✅ | ✅ | vllm | 8000 | |
| 22 | +| `mistral-7b-instruct` | ✅ | ✅ | vllm | 8000 | |
| 23 | +| `mixtral-8x7b` | ✅ | | vllm | 32768 | |
| 24 | +| `mixtral-8x7b-instruct` | ✅ | | vllm | 32768 | |
| 25 | +| `codellama-7b` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
| 26 | +| `codellama-7b-instruct` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
| 27 | +| `codellama-13b` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
| 28 | +| `codellama-13b-instruct` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
| 29 | +| `codellama-34b` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
| 30 | +| `codellama-34b-instruct` | ✅ | ✅ | text-generation-inference, vllm | 16384 | |
| 31 | +| `zephyr-7b-alpha` | ✅ | | text-generation-inference, vllm | 32768 | |
| 32 | +| `zephyr-7b-beta` | ✅ | | text-generation-inference, vllm | 32768 | |
31 | 33 |
|
32 | 34 | ## Usage
|
33 | 35 |
|
|
0 commit comments