Skip to content

Commit bd7966f

Browse files
committed
chore(cli): add a smol helpers to generate README.md tables
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
1 parent 509e969 commit bd7966f

File tree

6 files changed

+2776
-37
lines changed

6 files changed

+2776
-37
lines changed

README.md

Lines changed: 115 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1-
# 🦾 OpenLLM: Self-Hosting LLMs Made Easy
1+
<div align="center">
2+
<h1>🦾 OpenLLM: Self-Hosting LLMs Made Easy</h1>
3+
</div>
24

35
[![License: Apache-2.0](https://img.shields.io/badge/License-Apache%202-green.svg)](https://github.com/bentoml/OpenLLM/blob/main/LICENSE)
46
[![Releases](https://img.shields.io/pypi/v/openllm.svg?logo=pypi&label=PyPI&logoColor=gold)](https://pypi.org/project/openllm)
@@ -25,16 +27,110 @@ openllm hello
2527

2628
OpenLLM supports a wide range of state-of-the-art open-source LLMs. You can also add a [model repository to run custom models](#set-up-a-custom-repository) with OpenLLM.
2729

28-
| Model | Parameters | Quantization | Required GPU | Start a Server |
29-
| ---------------- | ---------- | ------------ | ------------- | ----------------------------------- |
30-
| Llama 3.3 | 70B | - | 80Gx2 | `openllm serve llama3.3:70b` |
31-
| Llama 3.2 | 3B | - | 12G | `openllm serve llama3.2:3b` |
32-
| Llama 3.2 Vision | 11B | - | 80G | `openllm serve llama3.2:11b-vision` |
33-
| Mistral | 7B | - | 24G | `openllm serve mistral:7b` |
34-
| Qwen 2.5 | 1.5B | - | 12G | `openllm serve qwen2.5:1.5b` |
35-
| Qwen 2.5 Coder | 7B | - | 24G | `openllm serve qwen2.5-coder:7b` |
36-
| Gemma 2 | 9B | - | 24G | `openllm serve gemma2:9b` |
37-
| Phi3 | 3.8B | - | 12G | `openllm serve phi3:3.8b` |
30+
<table>
31+
<tr>
32+
<th>Model</th>
33+
<th>Parameters</th>
34+
<th>Required GPU</th>
35+
<th>Start a Server</th>
36+
</tr>
37+
<tr>
38+
<td>deepseek-r1</td>
39+
<td>671B</td>
40+
<td>80Gx16</td>
41+
<td><code>openllm serve deepseek-r1:671b-fc3d</code></td>
42+
</tr>
43+
<tr>
44+
<td>deepseek-r1-distill</td>
45+
<td>14B</td>
46+
<td>80G</td>
47+
<td><code>openllm serve deepseek-r1-distill:qwen2.5-14b-98a9</code></td>
48+
</tr>
49+
<tr>
50+
<td>deepseek-v3</td>
51+
<td>671B</td>
52+
<td>80Gx16</td>
53+
<td><code>openllm serve deepseek-v3:671b-instruct-d7ec</code></td>
54+
</tr>
55+
<tr>
56+
<td>gemma2</td>
57+
<td>2B</td>
58+
<td>12G</td>
59+
<td><code>openllm serve gemma2:2b-instruct-747d</code></td>
60+
</tr>
61+
<tr>
62+
<td>llama3.1</td>
63+
<td>8B</td>
64+
<td>24G</td>
65+
<td><code>openllm serve llama3.1:8b-instruct-3c0c</code></td>
66+
</tr>
67+
<tr>
68+
<td>llama3.2</td>
69+
<td>1B</td>
70+
<td>24G</td>
71+
<td><code>openllm serve llama3.2:1b-instruct-f041</code></td>
72+
</tr>
73+
<tr>
74+
<td>llama3.3</td>
75+
<td>70B</td>
76+
<td>80Gx2</td>
77+
<td><code>openllm serve llama3.3:70b-instruct-b850</code></td>
78+
</tr>
79+
<tr>
80+
<td>mistral</td>
81+
<td>8B</td>
82+
<td>24G</td>
83+
<td><code>openllm serve mistral:8b-instruct-50e8</code></td>
84+
</tr>
85+
<tr>
86+
<td>mistral-large</td>
87+
<td>123B</td>
88+
<td>80Gx4</td>
89+
<td><code>openllm serve mistral-large:123b-instruct-1022</code></td>
90+
</tr>
91+
<tr>
92+
<td>mistralai</td>
93+
<td>24B</td>
94+
<td>80G</td>
95+
<td><code>openllm serve mistralai:24b-small-instruct-2501-0e69</code></td>
96+
</tr>
97+
<tr>
98+
<td>mixtral</td>
99+
<td>7B</td>
100+
<td>80Gx2</td>
101+
<td><code>openllm serve mixtral:8x7b-instruct-v0.1-b752</code></td>
102+
</tr>
103+
<tr>
104+
<td>phi4</td>
105+
<td>14B</td>
106+
<td>80G</td>
107+
<td><code>openllm serve phi4:14b-c12d</code></td>
108+
</tr>
109+
<tr>
110+
<td>pixtral</td>
111+
<td>12B</td>
112+
<td>80G</td>
113+
<td><code>openllm serve pixtral:12b-240910-c344</code></td>
114+
</tr>
115+
<tr>
116+
<td>qwen2.5</td>
117+
<td>7B</td>
118+
<td>24G</td>
119+
<td><code>openllm serve qwen2.5:7b-instruct-3260</code></td>
120+
</tr>
121+
<tr>
122+
<td>qwen2.5-coder</td>
123+
<td>7B</td>
124+
<td>24G</td>
125+
<td><code>openllm serve qwen2.5-coder:7b-instruct-e75d</code></td>
126+
</tr>
127+
<tr>
128+
<td>qwen2.5vl</td>
129+
<td>3B</td>
130+
<td>24G</td>
131+
<td><code>openllm serve qwen2.5vl:3b-instruct-4686</code></td>
132+
</tr>
133+
</table>
38134

39135
...
40136

@@ -46,15 +142,16 @@ To start an LLM server locally, use the `openllm serve` command and specify the
46142

47143
> [!NOTE]
48144
> OpenLLM does not store model weights. A Hugging Face token (HF_TOKEN) is required for gated models.
145+
>
49146
> 1. Create your Hugging Face token [here](https://huggingface.co/settings/tokens).
50-
> 2. Request access to the gated model, such as [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B).
147+
> 2. Request access to the gated model, such as [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct).
51148
> 3. Set your token as an environment variable by running:
52149
> ```bash
53150
> export HF_TOKEN=<your token>
54151
> ```
55152
56153
```bash
57-
openllm serve llama3:8b
154+
openllm serve openllm serve llama3.2:1b-instruct-f041
58155
```
59156
60157
The server will be accessible at [http://localhost:3000](http://localhost:3000/), providing OpenAI-compatible APIs for interaction. You can call the endpoints with different frameworks and tools that support OpenAI-compatible APIs. Typically, you may need to specify the following:
@@ -79,7 +176,7 @@ client = OpenAI(base_url='http://localhost:3000/v1', api_key='na')
79176
# print(model_list)
80177

81178
chat_completion = client.chat.completions.create(
82-
model="meta-llama/Meta-Llama-3-8B-Instruct",
179+
model="meta-llama/Llama-3.2-1B-Instruct",
83180
messages=[
84181
{
85182
"role": "user",
@@ -94,17 +191,17 @@ for chunk in chat_completion:
94191

95192
</details>
96193

97-
98194
<details>
99195

100196
<summary>LlamaIndex</summary>
101197

102198
```python
103199
from llama_index.llms.openai import OpenAI
104200

105-
llm = OpenAI(api_bese="http://localhost:3000/v1", model="meta-llama/Meta-Llama-3-8B-Instruct", api_key="dummy")
201+
llm = OpenAI(api_bese="http://localhost:3000/v1", model="meta-llama/Llama-3.2-1B-Instruct", api_key="dummy")
106202
...
107203
```
204+
108205
</details>
109206

110207
## Chat UI
@@ -138,7 +235,7 @@ openllm repo update
138235
To review a model’s information, run:
139236

140237
```bash
141-
openllm model get llama3:8b
238+
openllm model get openllm serve llama3.2:1b-instruct-f041
142239
```
143240

144241
### Add a model to the default model repository
@@ -166,7 +263,7 @@ OpenLLM supports LLM cloud deployment via BentoML, the unified model serving fra
166263
[Sign up for BentoCloud](https://www.bentoml.com/) for free and [log in](https://docs.bentoml.com/en/latest/bentocloud/how-tos/manage-access-token.html). Then, run `openllm deploy` to deploy a model to BentoCloud:
167264

168265
```bash
169-
openllm deploy llama3:8b
266+
openllm deploy openllm serve llama3.2:1b-instruct-f041
170267
```
171268

172269
> [!NOTE]
@@ -196,7 +293,6 @@ This project uses the following open-source projects:
196293
- [bentoml/bentoml](https://github.com/bentoml/bentoml) for production level model serving
197294
- [vllm-project/vllm](https://github.com/vllm-project/vllm) for production level LLM backend
198295
- [blrchen/chatgpt-lite](https://github.com/blrchen/chatgpt-lite) for a fancy Web Chat UI
199-
- [chujiezheng/chat_templates](https://github.com/chujiezheng/chat_templates)
200296
- [astral-sh/uv](https://github.com/astral-sh/uv) for blazing fast model requirements installing
201297

202298
We are grateful to the developers and contributors of these projects for their hard work and dedication.

0 commit comments

Comments
 (0)