@@ -7,70 +7,88 @@ WebLLM Javascript SDK
7
7
:local:
8
8
:depth: 2
9
9
10
- `WebLLM <https://www.npmjs.com/package/@mlc-ai/web-llm >`_ is an MLC chat web runtime
11
- that allows you to build chat applications directly in the browser, leveraging
12
- `WebGPU <https://www.w3.org/TR/webgpu/ >`_ and providing users a natural layer of abstraction.
10
+ `WebLLM <https://www.npmjs.com/package/@mlc-ai/web-llm >`_ is a high-performance in-browser LLM
11
+ inference engine, aiming to be the backend of AI-powered web applications and agents.
13
12
14
- Try out the Prebuilt Webpage
15
- ----------------------------
13
+ It provides a specialized runtime for the web backend of MLCEngine, leverages
14
+ `WebGPU <https://www.w3.org/TR/webgpu/ >`_ for local acceleration, offers OpenAI-compatible API,
15
+ and provides built-in support for web workers to separate heavy computation from the UI flow.
16
+
17
+ Please checkout the `WebLLM repo <https://github.com/mlc-ai/web-llm >`__ on how to use WebLLM to build
18
+ web application in Javascript/Typescript. Here we only provide a high-level idea and discuss how to
19
+ use MLC-LLM to compile your own model to run with WebLLM.
16
20
17
- To get started, you can try out `WebLLM prebuilt webpage <https://webllm.mlc.ai/#chat-demo >`__.
21
+ Getting Started
22
+ ---------------
18
23
19
- A WebGPU-compatible browser and a local GPU are needed to run WebLLM.
24
+ To get started, try out `WebLLM Chat <https://chat.webllm.ai/ >`__, which provides a great example
25
+ of integrating WebLLM into a full web application.
26
+
27
+ A WebGPU-compatible browser is needed to run WebLLM-powered web applications.
20
28
You can download the latest Google Chrome and use `WebGPU Report <https://webgpureport.org/ >`__
21
29
to verify the functionality of WebGPU on your browser.
22
30
31
+ WebLLM is available as an `npm package <https://www.npmjs.com/package/@mlc-ai/web-llm >`_ and is
32
+ also CDN-delivered. Try a simple chatbot example in
33
+ `this JSFiddle example <https://jsfiddle.net/neetnestor/4nmgvsa2/ >`__ without setup.
34
+
35
+ You can also checkout `existing examples <https://github.com/mlc-ai/web-llm/tree/main/examples >`__
36
+ on more advanced usage of WebLLM such as JSON mode, streaming, and more.
23
37
24
- Use WebLLM NPM Package
25
- ----------------------
38
+ Model Records in WebLLM
39
+ -----------------------
26
40
27
- WebLLM is available as an ` npm package <https://www.npmjs.com/package/@mlc-ai/web-llm >`_.
28
- The source code is available in ` the WebLLM repo < https://github.com/mlc-ai/web-llm >`_,
29
- where you can make your own modifications and build from source .
41
+ Each of the model in ` WebLLM Chat <https://chat.webllm.ai >`__ is registered as an instance of
42
+ `` ModelRecord `` and can be accessed at
43
+ ` webllm.prebuiltAppConfig.model_list < https://github.com/mlc-ai/web-llm/blob/main/src/config.ts#L293 >`__ .
30
44
31
- Note that the `WebLLM prebuilt webpage <https://webllm.mlc.ai/#chat-demo >`__ above
32
- is powered by the WebLLM npm package, specifically with the code in
33
- the `simple-chat <https://github.com/mlc-ai/web-llm/tree/main/examples/simple-chat >`__ example.
45
+ Looking at the most straightforward example `get-started <https://github.com/mlc-ai/web-llm/blob/main/examples/get-started/src/get_started.ts >`__,
46
+ there are two ways to run a model.
34
47
35
- Each of the model in the `WebLLM prebuilt webpage <https://webllm.mlc.ai/#chat-demo >`__
36
- is registered as an instance of ``ModelRecord ``. Looking at the most straightforward example
37
- `get-started <https://github.com/mlc-ai/web-llm/blob/main/examples/get-started/src/get_started.ts >`__,
38
- we see the code snippet:
48
+ One can either use the prebuilt model by simply calling ``reload() `` with the ``model_id ``:
39
49
40
50
.. code :: typescript
41
51
42
- const myAppConfig: AppConfig = {
52
+ const selectedModel = " Llama-3-8B-Instruct-q4f32_1-MLC" ;
53
+ const engine = await webllm .CreateMLCEngine (selectedModel );
54
+
55
+ Or one can specify their own model to run by creating a model record:
56
+
57
+ .. code :: typescript
58
+
59
+ const appConfig: webllm .AppConfig = {
43
60
model_list: [
44
61
{
45
- " model_url" : " https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f32_1-MLC/resolve/main/" ,
46
- " local_id" : " Llama-2-7b-chat-hf-q4f32_1" ,
47
- " model_lib_url" : " https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/Llama-2-7b-chat-hf/Llama-2-7b-chat-hf-q4f32_1-ctx4k_cs1k-webgpu.wasm" ,
48
- },
49
- {
50
- " model_url" : " https://huggingface.co/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC/resolve/main/" ,
51
- " local_id" : " Mistral-7B-Instruct-v0.2-q4f16_1" ,
52
- " model_lib_url" : " https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/Mistral-7B-Instruct-v0.2/Mistral-7B-Instruct-v0.2-q4f16_1-sw4k_cs1k-webgpu.wasm" ,
53
- " required_features" : [" shader-f16" ],
62
+ model: " https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f32_1-MLC" ,
63
+ model_id: " Llama-3-8B-Instruct-q4f32_1-MLC" ,
64
+ model_lib:
65
+ webllm .modelLibURLPrefix +
66
+ webllm .modelVersion +
67
+ " /Llama-3-8B-Instruct-q4f32_1-ctx4k_cs1k-webgpu.wasm" ,
54
68
},
55
69
// Add your own models here...
56
- ]
57
- }
58
- const selectedModel = " Llama-2-7b-chat-hf-q4f32_1"
59
- // const selectedModel = "Mistral-7B-Instruct-v0.1-q4f16_1"
60
- await chat .reload (selectedModel , undefined , myAppConfig );
70
+ ],
71
+ };
72
+ const selectedModel = " Llama-3-8B-Instruct-q4f32_1-MLC" ;
73
+ const engine: webllm .MLCEngineInterface = await webllm .CreateMLCEngine (
74
+ selectedModel ,
75
+ { appConfig: appConfig },
76
+ );
61
77
62
- Just like any other platforms, to run a model with on WebLLM, you need:
78
+ Looking at the code above, we find that, just like any other platforms supported by MLC-LLM, to
79
+ run a model on WebLLM, you need:
63
80
64
- 1. **Model weights ** converted to MLC format (e.g. `Llama-2-7b-hf-q4f32_1-MLC
65
- <https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f32_1-MLC/tree/main> `_.): downloaded through ``model_url ``
66
- 2. **Model library ** that comprises the inference logic (see repo `binary-mlc-llm-libs <https://github.com/mlc-ai/binary-mlc-llm-libs >`__): downloaded through ``model_lib_url ``.
81
+ 1. **Model weights ** converted to MLC format (e.g. `Llama-3-8B-Instruct-q4f32_1-MLC
82
+ <https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f32_1-MLC/tree/main> `_.): downloaded through the url ``ModelRecord.model ``
83
+ 2. **Model library ** that comprises the inference logic (see repo `binary-mlc-llm-libs <https://github.com/mlc-ai/binary-mlc-llm-libs/tree/main/web-llm-models >`__): downloaded through the url ``ModelRecord.model_lib ``.
84
+
85
+ In sections below, we walk you through two examples on how to add your own model besides the ones in
86
+ `webllm.prebuiltAppConfig.model_list <https://github.com/mlc-ai/web-llm/blob/main/src/config.ts#L293 >`__.
87
+ Before proceeding, please verify installation of ``mlc_llm `` and ``tvm ``.
67
88
68
89
Verify Installation for Adding Models
69
90
-------------------------------------
70
91
71
- In sections below, we walk you through two examples of adding models to WebLLM. Before proceeding,
72
- please verify installation of ``mlc_llm `` and ``tvm ``:
73
-
74
92
**Step 1. Verify mlc_llm **
75
93
76
94
We use the python package ``mlc_llm `` to compile models. This can be installed by
@@ -106,7 +124,7 @@ In cases where the model you are adding is simply a variant of an existing
106
124
model, we only need to convert weights and reuse existing model library. For instance:
107
125
108
126
- Adding ``OpenMistral `` when MLC supports ``Mistral ``
109
- - Adding `` Llama2-uncensored `` when MLC supports ``Llama2 ``
127
+ - Adding a `` Llama3 `` fine-tuned on a domain-specific task when MLC supports ``Llama3 ``
110
128
111
129
112
130
In this section, we walk you through adding ``WizardMath-7B-V1.1-q4f16_1 `` to the
@@ -150,23 +168,9 @@ See :ref:`compile-command-specification` for specification of ``gen_config``.
150
168
--quantization q4f16_1 --conv-template wizard_coder_or_math \
151
169
-o dist/WizardMath-7B-V1.1-q4f16_1-MLC/
152
170
153
- For the ``conv-template ``, `conversation_template.py <https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_llm/conversation_template.py >`__
154
- contains a full list of conversation templates that MLC provides.
155
-
156
- If the model you are adding requires a new conversation template, you would need to add your own.
157
- Follow `this PR <https://github.com/mlc-ai/mlc-llm/pull/2163 >`__ as an example. Besides, you also need to add the new template to ``/path/to/web-llm/src/conversation.ts ``.
158
- We look up the template to use with the ``conv_template `` field in ``mlc-chat-config.json ``.
159
-
160
- For more details, please see :ref: `configure-mlc-chat-json `.
161
-
162
- .. note ::
163
-
164
- If you added your conversation template in ``src/conversation.ts ``, you need to build WebLLM
165
- from source following the instruction in
166
- `the WebLLM repo's README <https://github.com/mlc-ai/web-llm?tab=readme-ov-file#build-webllm-package-from-source >`_.
167
-
168
- Alternatively, you could use the ``"custom" `` conversation template so that you can pass in
169
- your own ``ConvTemplateConfig `` in runtime without having to build the package from source.
171
+ For the ``conv-template ``, `conversation_template.py <https://github.com/mlc-ai/mlc-llm/tree/main/python/mlc_llm/conversation_template >`__
172
+ contains a full list of conversation templates that MLC provides. You can also manually modify the ``mlc-chat-config.json `` to
173
+ add your customized conversation template.
170
174
171
175
**Step 3 Upload weights to HF **
172
176
@@ -192,26 +196,30 @@ Finally, we modify the code snippet for
192
196
`get-started <https://github.com/mlc-ai/web-llm/blob/main/examples/get-started/src/get_started.ts >`__
193
197
pasted above.
194
198
195
- We simply specify the Huggingface link as ``model_url ``, while reusing the ``model_lib_url `` for
196
- ``Mistral-7B ``. Note that we need the suffix to be `` /resolve/main/ ``.
199
+ We simply specify the Huggingface link as ``model ``, while reusing the ``model_lib `` for
200
+ ``Mistral-7B ``.
197
201
198
202
.. code :: typescript
199
203
200
- const myAppConfig : AppConfig = {
204
+ const appConfig : webllm . AppConfig = {
201
205
model_list: [
202
- // Other records here omitted...
203
206
{
204
- // Substitute model_url with the one you created `my-huggingface-account/my-wizardMath-weight-huggingface-repo`
205
- " model_url" : " https://huggingface.co/mlc-ai/WizardMath-7B-V1.1-q4f16_1-MLC/resolve/main/" ,
206
- " local_id" : " WizardMath-7B-V1.1-q4f16_1" ,
207
- " model_lib_url" : " https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/Mistral-7B-Instruct-v0.2/Mistral-7B-Instruct-v0.2-q4f16_1-sw4k_cs1k-webgpu.wasm" ,
208
- " required_features" : [" shader-f16" ],
207
+ model: " https://huggingface.co/mlc-ai/WizardMath-7B-V1.1-q4f16_1-MLC" ,
208
+ model_id: " WizardMath-7B-V1.1-q4f16_1-MLC" ,
209
+ model_lib:
210
+ webllm .modelLibURLPrefix +
211
+ webllm .modelVersion +
212
+ " /Mistral-7B-Instruct-v0.3-q4f16_1-ctx4k_cs1k-webgpu.wasm" ,
209
213
},
210
- ]
211
- }
214
+ // Add your own models here...
215
+ ],
216
+ };
212
217
213
218
const selectedModel = " WizardMath-7B-V1.1-q4f16_1"
214
- await chat .reload (selectedModel , undefined , myAppConfig );
219
+ const engine: webllm .MLCEngineInterface = await webllm .CreateMLCEngine (
220
+ selectedModel ,
221
+ { appConfig: appConfig },
222
+ );
215
223
216
224
Now, running the ``get-started `` example will use the ``WizardMath `` model you just added.
217
225
See `get-started's README <https://github.com/mlc-ai/web-llm/tree/main/examples/get-started#webllm-get-started-app >`__
@@ -223,9 +231,9 @@ Bring Your Own Model Library
223
231
224
232
A model library is specified by:
225
233
226
- - The model architecture (e.g. ``llama-2 ``, ``gpt-neox ``)
234
+ - The model architecture (e.g. ``llama-3 ``, ``gpt-neox ``, `` phi-3 ``)
227
235
- Quantization (e.g. ``q4f16_1 ``, ``q0f32 ``)
228
- - Metadata (e.g. ``context_window_size ``, ``sliding_window_size ``, ``prefill-chunk-size ``), which affects memory planning
236
+ - Metadata (e.g. ``context_window_size ``, ``sliding_window_size ``, ``prefill-chunk-size ``), which affects memory planning (currently only `` prefill-chunk-size `` affects the compiled model)
229
237
- Platform (e.g. ``cuda ``, ``webgpu ``, ``iOS ``)
230
238
231
239
In cases where the model you want to run is not compatible with the provided MLC
@@ -288,9 +296,8 @@ All these knobs are specified in ``mlc-chat-config.json`` generated by ``gen_con
288
296
--device webgpu -o dist/libs/RedPajama-INCITE-Chat-3B-v1-q4f16_1-webgpu.wasm
289
297
290
298
.. note ::
291
- When compiling larger models like ``Llama-2-7B ``, you may want to add ``--prefill_chunk_size 1024 `` or
292
- lower ``context_window_size `` to decrease memory usage. Otherwise, during runtime,
293
- you may run into issues like:
299
+ When compiling larger models like ``Llama-3-8B ``, you may want to add ``--prefill_chunk_size 1024 ``
300
+ to decrease memory usage. Otherwise, during runtime, you may run into issues like:
294
301
295
302
.. code :: text
296
303
@@ -344,17 +351,20 @@ Finally, we are able to run the model we added in WebLLM's `get-started <https:/
344
351
model_list: [
345
352
// Other records here omitted...
346
353
{
347
- " model_url " : " https://huggingface.co/my-hf-account/my-redpajama3b-weight-huggingface-repo/resolve/main/" ,
348
- " local_id " : " RedPajama-INCITE-Instruct-3B-v1" ,
349
- " model_lib_url " : " https://raw.githubusercontent.com/my-gh-account/my-repo/main/RedPajama-INCITE-Chat-3B-v1-q4f16_1-webgpu.wasm" ,
354
+ " model " : " https://huggingface.co/my-hf-account/my-redpajama3b-weight-huggingface-repo/resolve/main/" ,
355
+ " model_id " : " RedPajama-INCITE-Instruct-3B-v1" ,
356
+ " model_lib " : " https://raw.githubusercontent.com/my-gh-account/my-repo/main/RedPajama-INCITE-Chat-3B-v1-q4f16_1-webgpu.wasm" ,
350
357
" required_features" : [" shader-f16" ],
351
358
},
352
359
]
353
360
}
354
361
355
- const selectedModel = " RedPajama-INCITE-Instruct-3B-v1"
356
- await chat .reload (selectedModel , undefined , myAppConfig );
362
+ const selectedModel = " RedPajama-INCITE-Instruct-3B-v1" ;
363
+ const engine: webllm .MLCEngineInterface = await webllm .CreateMLCEngine (
364
+ selectedModel ,
365
+ { appConfig: appConfig },
366
+ );
357
367
358
368
Now, running the ``get-started `` example will use the ``RedPajama `` model you just added.
359
369
See `get-started's README <https://github.com/mlc-ai/web-llm/tree/main/examples/get-started#webllm-get-started-app >`__
360
- on how to run it.
370
+ on how to run it.
0 commit comments