Skip to content

Commit 6036292

Browse files
authored
Update llm_inference.md
Signed-off-by: Michael Yuan <michael@michaelyuan.com>
1 parent 4452308 commit 6036292

File tree

1 file changed

+4
-12
lines changed

1 file changed

+4
-12
lines changed

docs/develop/rust/wasinn/llm_inference.md

Lines changed: 4 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -60,18 +60,10 @@ Second, use `cargo` to build the example project.
6060
cargo build --target wasm32-wasi --release
6161
```
6262

63-
The output WASM file is `target/wasm32-wasi/release/llama-chat.wasm`.
64-
65-
We also need to get the model. Here we use the llama-2-13b model.
66-
67-
```bash
68-
curl -LO https://huggingface.co/wasmedge/llama2/blob/main/llama-2-13b-chat-q5_k_m.gguf
69-
```
70-
71-
Next, use WasmEdge to load the llama-2-13b model and then ask the model to questions.
63+
The output WASM file is `target/wasm32-wasi/release/llama-chat.wasm`. Next, use WasmEdge to load the llama-2-7b model and then ask the model to questions.
7264

7365
```bash
74-
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-13b-chat-q5_k_m.gguf llama-chat.wasm
66+
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm
7567
```
7668

7769
After executing the command, you may need to wait a moment for the input prompt to appear. You can enter your question once you see the `[USER]:` prompt:
@@ -158,7 +150,7 @@ You can make the inference program run faster by AOT compiling the wasm file fir
158150
159151
```bash
160152
wasmedge compile llama-chat.wasm llama-chat.wasm
161-
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-13b-q5_k_m.gguf llama-chat.wasm
153+
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm
162154
```
163155
164156
## Understand the code
@@ -260,7 +252,7 @@ Next, execute the model inference.
260252
context.compute().expect("Failed to complete inference");
261253
```
262254
263-
After the inference is fiished, extract the result from the computation context and losing invalid UTF8 sequences handled by converting the output to a string using `String::from_utf8_lossy`.
255+
After the inference is finished, extract the result from the computation context and losing invalid UTF8 sequences handled by converting the output to a string using `String::from_utf8_lossy`.
264256
265257
```rust
266258
let mut output_buffer = vec![0u8; *CTX_SIZE.get().unwrap()];

0 commit comments

Comments
 (0)