Skip to content

Commit 75b970b

Browse files
authored
[Doc] Update WebLLM doc (#2578)
Update documentation for WebLLM. Currently we only provide a high-level view for WebLLM runtime here, and refer user to the WebLLM repo README for more. The documentation focuses on adding their own model variant / model library for WebLLM. Will follow up with more thorough runtime documentation.
1 parent ceba951 commit 75b970b

File tree

2 files changed

+104
-82
lines changed

2 files changed

+104
-82
lines changed

docs/deploy/webllm.rst

Lines changed: 92 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -7,70 +7,88 @@ WebLLM Javascript SDK
77
:local:
88
:depth: 2
99

10-
`WebLLM <https://www.npmjs.com/package/@mlc-ai/web-llm>`_ is an MLC chat web runtime
11-
that allows you to build chat applications directly in the browser, leveraging
12-
`WebGPU <https://www.w3.org/TR/webgpu/>`_ and providing users a natural layer of abstraction.
10+
`WebLLM <https://www.npmjs.com/package/@mlc-ai/web-llm>`_ is a high-performance in-browser LLM
11+
inference engine, aiming to be the backend of AI-powered web applications and agents.
1312

14-
Try out the Prebuilt Webpage
15-
----------------------------
13+
It provides a specialized runtime for the web backend of MLCEngine, leverages
14+
`WebGPU <https://www.w3.org/TR/webgpu/>`_ for local acceleration, offers OpenAI-compatible API,
15+
and provides built-in support for web workers to separate heavy computation from the UI flow.
16+
17+
Please checkout the `WebLLM repo <https://github.com/mlc-ai/web-llm>`__ on how to use WebLLM to build
18+
web application in Javascript/Typescript. Here we only provide a high-level idea and discuss how to
19+
use MLC-LLM to compile your own model to run with WebLLM.
1620

17-
To get started, you can try out `WebLLM prebuilt webpage <https://webllm.mlc.ai/#chat-demo>`__.
21+
Getting Started
22+
---------------
1823

19-
A WebGPU-compatible browser and a local GPU are needed to run WebLLM.
24+
To get started, try out `WebLLM Chat <https://chat.webllm.ai/>`__, which provides a great example
25+
of integrating WebLLM into a full web application.
26+
27+
A WebGPU-compatible browser is needed to run WebLLM-powered web applications.
2028
You can download the latest Google Chrome and use `WebGPU Report <https://webgpureport.org/>`__
2129
to verify the functionality of WebGPU on your browser.
2230

31+
WebLLM is available as an `npm package <https://www.npmjs.com/package/@mlc-ai/web-llm>`_ and is
32+
also CDN-delivered. Try a simple chatbot example in
33+
`this JSFiddle example <https://jsfiddle.net/neetnestor/4nmgvsa2/>`__ without setup.
34+
35+
You can also checkout `existing examples <https://github.com/mlc-ai/web-llm/tree/main/examples>`__
36+
on more advanced usage of WebLLM such as JSON mode, streaming, and more.
2337

24-
Use WebLLM NPM Package
25-
----------------------
38+
Model Records in WebLLM
39+
-----------------------
2640

27-
WebLLM is available as an `npm package <https://www.npmjs.com/package/@mlc-ai/web-llm>`_.
28-
The source code is available in `the WebLLM repo <https://github.com/mlc-ai/web-llm>`_,
29-
where you can make your own modifications and build from source.
41+
Each of the model in `WebLLM Chat <https://chat.webllm.ai>`__ is registered as an instance of
42+
``ModelRecord`` and can be accessed at
43+
`webllm.prebuiltAppConfig.model_list <https://github.com/mlc-ai/web-llm/blob/main/src/config.ts#L293>`__.
3044

31-
Note that the `WebLLM prebuilt webpage <https://webllm.mlc.ai/#chat-demo>`__ above
32-
is powered by the WebLLM npm package, specifically with the code in
33-
the `simple-chat <https://github.com/mlc-ai/web-llm/tree/main/examples/simple-chat>`__ example.
45+
Looking at the most straightforward example `get-started <https://github.com/mlc-ai/web-llm/blob/main/examples/get-started/src/get_started.ts>`__,
46+
there are two ways to run a model.
3447

35-
Each of the model in the `WebLLM prebuilt webpage <https://webllm.mlc.ai/#chat-demo>`__
36-
is registered as an instance of ``ModelRecord``. Looking at the most straightforward example
37-
`get-started <https://github.com/mlc-ai/web-llm/blob/main/examples/get-started/src/get_started.ts>`__,
38-
we see the code snippet:
48+
One can either use the prebuilt model by simply calling ``reload()`` with the ``model_id``:
3949

4050
.. code:: typescript
4151
42-
const myAppConfig: AppConfig = {
52+
const selectedModel = "Llama-3-8B-Instruct-q4f32_1-MLC";
53+
const engine = await webllm.CreateMLCEngine(selectedModel);
54+
55+
Or one can specify their own model to run by creating a model record:
56+
57+
.. code:: typescript
58+
59+
const appConfig: webllm.AppConfig = {
4360
model_list: [
4461
{
45-
"model_url": "https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f32_1-MLC/resolve/main/",
46-
"local_id": "Llama-2-7b-chat-hf-q4f32_1",
47-
"model_lib_url": "https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/Llama-2-7b-chat-hf/Llama-2-7b-chat-hf-q4f32_1-ctx4k_cs1k-webgpu.wasm",
48-
},
49-
{
50-
"model_url": "https://huggingface.co/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC/resolve/main/",
51-
"local_id": "Mistral-7B-Instruct-v0.2-q4f16_1",
52-
"model_lib_url": "https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/Mistral-7B-Instruct-v0.2/Mistral-7B-Instruct-v0.2-q4f16_1-sw4k_cs1k-webgpu.wasm",
53-
"required_features": ["shader-f16"],
62+
model: "https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f32_1-MLC",
63+
model_id: "Llama-3-8B-Instruct-q4f32_1-MLC",
64+
model_lib:
65+
webllm.modelLibURLPrefix +
66+
webllm.modelVersion +
67+
"/Llama-3-8B-Instruct-q4f32_1-ctx4k_cs1k-webgpu.wasm",
5468
},
5569
// Add your own models here...
56-
]
57-
}
58-
const selectedModel = "Llama-2-7b-chat-hf-q4f32_1"
59-
// const selectedModel = "Mistral-7B-Instruct-v0.1-q4f16_1"
60-
await chat.reload(selectedModel, undefined, myAppConfig);
70+
],
71+
};
72+
const selectedModel = "Llama-3-8B-Instruct-q4f32_1-MLC";
73+
const engine: webllm.MLCEngineInterface = await webllm.CreateMLCEngine(
74+
selectedModel,
75+
{ appConfig: appConfig },
76+
);
6177
62-
Just like any other platforms, to run a model with on WebLLM, you need:
78+
Looking at the code above, we find that, just like any other platforms supported by MLC-LLM, to
79+
run a model on WebLLM, you need:
6380

64-
1. **Model weights** converted to MLC format (e.g. `Llama-2-7b-hf-q4f32_1-MLC
65-
<https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f32_1-MLC/tree/main>`_.): downloaded through ``model_url``
66-
2. **Model library** that comprises the inference logic (see repo `binary-mlc-llm-libs <https://github.com/mlc-ai/binary-mlc-llm-libs>`__): downloaded through ``model_lib_url``.
81+
1. **Model weights** converted to MLC format (e.g. `Llama-3-8B-Instruct-q4f32_1-MLC
82+
<https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f32_1-MLC/tree/main>`_.): downloaded through the url ``ModelRecord.model``
83+
2. **Model library** that comprises the inference logic (see repo `binary-mlc-llm-libs <https://github.com/mlc-ai/binary-mlc-llm-libs/tree/main/web-llm-models>`__): downloaded through the url ``ModelRecord.model_lib``.
84+
85+
In sections below, we walk you through two examples on how to add your own model besides the ones in
86+
`webllm.prebuiltAppConfig.model_list <https://github.com/mlc-ai/web-llm/blob/main/src/config.ts#L293>`__.
87+
Before proceeding, please verify installation of ``mlc_llm`` and ``tvm``.
6788

6889
Verify Installation for Adding Models
6990
-------------------------------------
7091

71-
In sections below, we walk you through two examples of adding models to WebLLM. Before proceeding,
72-
please verify installation of ``mlc_llm`` and ``tvm``:
73-
7492
**Step 1. Verify mlc_llm**
7593

7694
We use the python package ``mlc_llm`` to compile models. This can be installed by
@@ -106,7 +124,7 @@ In cases where the model you are adding is simply a variant of an existing
106124
model, we only need to convert weights and reuse existing model library. For instance:
107125

108126
- Adding ``OpenMistral`` when MLC supports ``Mistral``
109-
- Adding ``Llama2-uncensored`` when MLC supports ``Llama2``
127+
- Adding a ``Llama3`` fine-tuned on a domain-specific task when MLC supports ``Llama3``
110128

111129

112130
In this section, we walk you through adding ``WizardMath-7B-V1.1-q4f16_1`` to the
@@ -150,23 +168,9 @@ See :ref:`compile-command-specification` for specification of ``gen_config``.
150168
--quantization q4f16_1 --conv-template wizard_coder_or_math \
151169
-o dist/WizardMath-7B-V1.1-q4f16_1-MLC/
152170
153-
For the ``conv-template``, `conversation_template.py <https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_llm/conversation_template.py>`__
154-
contains a full list of conversation templates that MLC provides.
155-
156-
If the model you are adding requires a new conversation template, you would need to add your own.
157-
Follow `this PR <https://github.com/mlc-ai/mlc-llm/pull/2163>`__ as an example. Besides, you also need to add the new template to ``/path/to/web-llm/src/conversation.ts``.
158-
We look up the template to use with the ``conv_template`` field in ``mlc-chat-config.json``.
159-
160-
For more details, please see :ref:`configure-mlc-chat-json`.
161-
162-
.. note::
163-
164-
If you added your conversation template in ``src/conversation.ts``, you need to build WebLLM
165-
from source following the instruction in
166-
`the WebLLM repo's README <https://github.com/mlc-ai/web-llm?tab=readme-ov-file#build-webllm-package-from-source>`_.
167-
168-
Alternatively, you could use the ``"custom"`` conversation template so that you can pass in
169-
your own ``ConvTemplateConfig`` in runtime without having to build the package from source.
171+
For the ``conv-template``, `conversation_template.py <https://github.com/mlc-ai/mlc-llm/tree/main/python/mlc_llm/conversation_template>`__
172+
contains a full list of conversation templates that MLC provides. You can also manually modify the ``mlc-chat-config.json`` to
173+
add your customized conversation template.
170174

171175
**Step 3 Upload weights to HF**
172176

@@ -192,26 +196,30 @@ Finally, we modify the code snippet for
192196
`get-started <https://github.com/mlc-ai/web-llm/blob/main/examples/get-started/src/get_started.ts>`__
193197
pasted above.
194198

195-
We simply specify the Huggingface link as ``model_url``, while reusing the ``model_lib_url`` for
196-
``Mistral-7B``. Note that we need the suffix to be ``/resolve/main/``.
199+
We simply specify the Huggingface link as ``model``, while reusing the ``model_lib`` for
200+
``Mistral-7B``.
197201

198202
.. code:: typescript
199203
200-
const myAppConfig: AppConfig = {
204+
const appConfig: webllm.AppConfig = {
201205
model_list: [
202-
// Other records here omitted...
203206
{
204-
// Substitute model_url with the one you created `my-huggingface-account/my-wizardMath-weight-huggingface-repo`
205-
"model_url": "https://huggingface.co/mlc-ai/WizardMath-7B-V1.1-q4f16_1-MLC/resolve/main/",
206-
"local_id": "WizardMath-7B-V1.1-q4f16_1",
207-
"model_lib_url": "https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/Mistral-7B-Instruct-v0.2/Mistral-7B-Instruct-v0.2-q4f16_1-sw4k_cs1k-webgpu.wasm",
208-
"required_features": ["shader-f16"],
207+
model: "https://huggingface.co/mlc-ai/WizardMath-7B-V1.1-q4f16_1-MLC",
208+
model_id: "WizardMath-7B-V1.1-q4f16_1-MLC",
209+
model_lib:
210+
webllm.modelLibURLPrefix +
211+
webllm.modelVersion +
212+
"/Mistral-7B-Instruct-v0.3-q4f16_1-ctx4k_cs1k-webgpu.wasm",
209213
},
210-
]
211-
}
214+
// Add your own models here...
215+
],
216+
};
212217
213218
const selectedModel = "WizardMath-7B-V1.1-q4f16_1"
214-
await chat.reload(selectedModel, undefined, myAppConfig);
219+
const engine: webllm.MLCEngineInterface = await webllm.CreateMLCEngine(
220+
selectedModel,
221+
{ appConfig: appConfig },
222+
);
215223
216224
Now, running the ``get-started`` example will use the ``WizardMath`` model you just added.
217225
See `get-started's README <https://github.com/mlc-ai/web-llm/tree/main/examples/get-started#webllm-get-started-app>`__
@@ -223,9 +231,9 @@ Bring Your Own Model Library
223231

224232
A model library is specified by:
225233

226-
- The model architecture (e.g. ``llama-2``, ``gpt-neox``)
234+
- The model architecture (e.g. ``llama-3``, ``gpt-neox``, ``phi-3``)
227235
- Quantization (e.g. ``q4f16_1``, ``q0f32``)
228-
- Metadata (e.g. ``context_window_size``, ``sliding_window_size``, ``prefill-chunk-size``), which affects memory planning
236+
- Metadata (e.g. ``context_window_size``, ``sliding_window_size``, ``prefill-chunk-size``), which affects memory planning (currently only ``prefill-chunk-size`` affects the compiled model)
229237
- Platform (e.g. ``cuda``, ``webgpu``, ``iOS``)
230238

231239
In cases where the model you want to run is not compatible with the provided MLC
@@ -288,9 +296,8 @@ All these knobs are specified in ``mlc-chat-config.json`` generated by ``gen_con
288296
--device webgpu -o dist/libs/RedPajama-INCITE-Chat-3B-v1-q4f16_1-webgpu.wasm
289297
290298
.. note::
291-
When compiling larger models like ``Llama-2-7B``, you may want to add ``--prefill_chunk_size 1024`` or
292-
lower ``context_window_size`` to decrease memory usage. Otherwise, during runtime,
293-
you may run into issues like:
299+
When compiling larger models like ``Llama-3-8B``, you may want to add ``--prefill_chunk_size 1024``
300+
to decrease memory usage. Otherwise, during runtime, you may run into issues like:
294301

295302
.. code:: text
296303
@@ -344,17 +351,20 @@ Finally, we are able to run the model we added in WebLLM's `get-started <https:/
344351
model_list: [
345352
// Other records here omitted...
346353
{
347-
"model_url": "https://huggingface.co/my-hf-account/my-redpajama3b-weight-huggingface-repo/resolve/main/",
348-
"local_id": "RedPajama-INCITE-Instruct-3B-v1",
349-
"model_lib_url": "https://raw.githubusercontent.com/my-gh-account/my-repo/main/RedPajama-INCITE-Chat-3B-v1-q4f16_1-webgpu.wasm",
354+
"model": "https://huggingface.co/my-hf-account/my-redpajama3b-weight-huggingface-repo/resolve/main/",
355+
"model_id": "RedPajama-INCITE-Instruct-3B-v1",
356+
"model_lib": "https://raw.githubusercontent.com/my-gh-account/my-repo/main/RedPajama-INCITE-Chat-3B-v1-q4f16_1-webgpu.wasm",
350357
"required_features": ["shader-f16"],
351358
},
352359
]
353360
}
354361
355-
const selectedModel = "RedPajama-INCITE-Instruct-3B-v1"
356-
await chat.reload(selectedModel, undefined, myAppConfig);
362+
const selectedModel = "RedPajama-INCITE-Instruct-3B-v1";
363+
const engine: webllm.MLCEngineInterface = await webllm.CreateMLCEngine(
364+
selectedModel,
365+
{ appConfig: appConfig },
366+
);
357367
358368
Now, running the ``get-started`` example will use the ``RedPajama`` model you just added.
359369
See `get-started's README <https://github.com/mlc-ai/web-llm/tree/main/examples/get-started#webllm-get-started-app>`__
360-
on how to run it.
370+
on how to run it.

docs/install/emcc.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,18 @@ Validate that emcc is accessible in shell
2121
2222
emcc --version
2323
24+
.. note::
25+
We recently found that using the latest ``emcc`` version may run into issues during runtime. Use
26+
``./emsdk install 3.1.56`` instead of ``./emsdk install latest`` for now as a workaround.
27+
28+
The error may look like
29+
30+
.. code:: text
31+
32+
Init error, LinkError: WebAssembly.instantiate(): Import #6 module="wasi_snapshot_preview1"
33+
function="proc_exit": function import requires a callable
34+
35+
2436
Step 2: Set TVM_SOURCE_DIR and MLC_LLM_SOURCE_DIR
2537
-------------------------------------------------
2638

0 commit comments

Comments
 (0)