You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/develop/rust/wasinn/llm_inference.md
+5-16Lines changed: 5 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ sidebar_position: 1
4
4
5
5
# Llama 2 inference
6
6
7
-
WasmEdge now supports running llama2 series of models in Rust. We will use [this example project](https://github.com/second-state/LlamaEdge/tree/main/chat) to show how to make AI inferences with the llama2 model in WasmEdge and Rust.
7
+
WasmEdge now supports running open source models in Rust. We will use [this example project](https://github.com/second-state/LlamaEdge/tree/main/chat) to show how to make AI inferences with the llama2 model in WasmEdge and Rust.
8
8
9
9
WasmEdge now supports the following models:
10
10
@@ -39,7 +39,7 @@ WasmEdge now supports the following models:
39
39
1. Nous-Hermes-2-Mixtral-8x7B-DPO
40
40
1. Nous-Hermes-2-Mixtral-8x7B-SFT
41
41
42
-
And more, please check [the supported models](https://github.com/second-state/LlamaEdge/blob/main/models.md) for detials.
42
+
And more, please check [the supported models](https://github.com/second-state/LlamaEdge/blob/main/models.md) for details.
43
43
44
44
## Prerequisite
45
45
@@ -145,17 +145,6 @@ You can configure the chat inference application through CLI options.
145
145
146
146
The `--prompt-template` option is perhaps the most interesting. It allows the application to support different open source LLM models beyond llama2. Check out more prompt templates [here](https://github.com/LlamaEdge/LlamaEdge/tree/main/api-server/chat-prompts).
--prompt 'Robert Oppenheimer most important achievement is ' --ctx-size 4096
189
+
--prompt 'Robert Oppenheimer most important achievement is ' --ctx-size 512
201
190
202
191
output: in 1942, when he led the team that developed the first atomic bomb, which was dropped on Hiroshima, Japan in 1945.
203
192
```
@@ -286,7 +275,7 @@ Next, execute the model inference.
286
275
context.compute().expect("Failed to complete inference");
287
276
```
288
277
289
-
After the inference is finished, extract the result from the computation context and losing invalid UTF8 sequences handled by converting the output to a string using `String::from_utf8_lossy`.
278
+
After the inference is finished, extract the result from the computation context and lose invalid UTF8 sequences handled by converting the output to a string using `String::from_utf8_lossy`.
290
279
291
280
```rust
292
281
let mut output_buffer = vec![0u8; *CTX_SIZE.get().unwrap()];
* If you're looking for multi-turn conversations with llama 2 models, please check out the above mentioned chat example source code [here](https://github.com/second-state/llamaedge/tree/main/chat).
310
-
* If you want to construct OpenAI-compatible APIs specifically foryour llama2 model, or the Llama2 model itself, please check out the source code [for the API server](https://github.com/second-state/llamaedge/tree/main/api-server).
299
+
* If you want to construct OpenAI-compatible APIs specifically forany open-source LLMs, please check out the source code [for the API server](https://github.com/second-state/llamaedge/tree/main/api-server).
311
300
* To learn more, please check out [this article](https://medium.com/stackademic/fast-and-portable-llama2-inference-on-the-heterogeneous-edge-a62508e82359).
0 commit comments