Skip to content

Commit e3cca88

Browse files
alabulei1juntao
authored andcommitted
Update llm_inference.md
Signed-off-by: alabulei1 <vivian.xiage@gmail.com>
1 parent 43877ca commit e3cca88

File tree

1 file changed

+5
-16
lines changed

1 file changed

+5
-16
lines changed

docs/develop/rust/wasinn/llm_inference.md

Lines changed: 5 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ sidebar_position: 1
44

55
# Llama 2 inference
66

7-
WasmEdge now supports running llama2 series of models in Rust. We will use [this example project](https://github.com/second-state/LlamaEdge/tree/main/chat) to show how to make AI inferences with the llama2 model in WasmEdge and Rust.
7+
WasmEdge now supports running open source models in Rust. We will use [this example project](https://github.com/second-state/LlamaEdge/tree/main/chat) to show how to make AI inferences with the llama2 model in WasmEdge and Rust.
88

99
WasmEdge now supports the following models:
1010

@@ -39,7 +39,7 @@ WasmEdge now supports the following models:
3939
1. Nous-Hermes-2-Mixtral-8x7B-DPO
4040
1. Nous-Hermes-2-Mixtral-8x7B-SFT
4141

42-
And more, please check [the supported models](https://github.com/second-state/LlamaEdge/blob/main/models.md) for detials.
42+
And more, please check [the supported models](https://github.com/second-state/LlamaEdge/blob/main/models.md) for details.
4343

4444
## Prerequisite
4545

@@ -145,17 +145,6 @@ You can configure the chat inference application through CLI options.
145145
146146
The `--prompt-template` option is perhaps the most interesting. It allows the application to support different open source LLM models beyond llama2. Check out more prompt templates [here](https://github.com/LlamaEdge/LlamaEdge/tree/main/api-server/chat-prompts).
147147
148-
| Template name | Model | Download |
149-
| ------------ | ------------------------------ | --- |
150-
| llama-2-chat | [The standard llama2 chat model](https://ai.meta.com/llama/) | [7b](https://huggingface.co/wasmedge/llama2/resolve/main/llama-2-7b-chat-q5_k_m.gguf) |
151-
| codellama-instruct | [CodeLlama](https://about.fb.com/news/2023/08/code-llama-ai-for-coding/) | [7b](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF/resolve/main/codellama-7b-instruct.Q5_K_M.gguf) |
152-
| mistral-instruct-v0.1 | [Mistral](https://mistral.ai/) | [7b](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q5_K_M.gguf) |
153-
| mistrallite | [Mistral Lite](https://huggingface.co/amazon/MistralLite) | [7b](https://huggingface.co/TheBloke/MistralLite-7B-GGUF/resolve/main/mistrallite.Q5_K_M.gguf) |
154-
| openchat | [OpenChat](https://github.com/imoneoi/openchat) | [7b](https://huggingface.co/TheBloke/openchat_3.5-GGUF/resolve/main/openchat_3.5.Q5_K_M.gguf) |
155-
| belle-llama-2-chat | [BELLE](https://github.com/LianjiaTech/BELLE) | [13b](https://huggingface.co/second-state/BELLE-Llama2-13B-Chat-0.4M-GGUF/resolve/main/BELLE-Llama2-13B-Chat-0.4M-ggml-model-q4_0.gguf) |
156-
| vicuna-chat | [Vicuna](https://lmsys.org/blog/2023-03-30-vicuna/) | [7b](https://huggingface.co/TheBloke/vicuna-7B-v1.5-GGUF/resolve/main/vicuna-7b-v1.5.Q5_K_M.gguf) |
157-
| chatml | [ChatML](https://huggingface.co/chargoddard/rpguild-chatml-13b) | [13b](https://huggingface.co/TheBloke/rpguild-chatml-13B-GGUF/resolve/main/rpguild-chatml-13b.Q5_K_M.gguf) |
158-
159148
Furthermore, the following command tells WasmEdge to print out logs and statistics of the model at runtime.
160149
161150
```bash
@@ -197,7 +186,7 @@ curl -LO https://github.com/second-state/llamaedge/releases/latest/download/llam
197186
198187
# Give it a prompt and ask it to use the model to complete it.
199188
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf llama-simple.wasm \
200-
--prompt 'Robert Oppenheimer most important achievement is ' --ctx-size 4096
189+
--prompt 'Robert Oppenheimer most important achievement is ' --ctx-size 512
201190
202191
output: in 1942, when he led the team that developed the first atomic bomb, which was dropped on Hiroshima, Japan in 1945.
203192
```
@@ -286,7 +275,7 @@ Next, execute the model inference.
286275
context.compute().expect("Failed to complete inference");
287276
```
288277
289-
After the inference is finished, extract the result from the computation context and losing invalid UTF8 sequences handled by converting the output to a string using `String::from_utf8_lossy`.
278+
After the inference is finished, extract the result from the computation context and lose invalid UTF8 sequences handled by converting the output to a string using `String::from_utf8_lossy`.
290279
291280
```rust
292281
let mut output_buffer = vec![0u8; *CTX_SIZE.get().unwrap()];
@@ -307,5 +296,5 @@ println!("\noutput: {}", output);
307296
## Resources
308297
309298
* If you're looking for multi-turn conversations with llama 2 models, please check out the above mentioned chat example source code [here](https://github.com/second-state/llamaedge/tree/main/chat).
310-
* If you want to construct OpenAI-compatible APIs specifically for your llama2 model, or the Llama2 model itself, please check out the source code [for the API server](https://github.com/second-state/llamaedge/tree/main/api-server).
299+
* If you want to construct OpenAI-compatible APIs specifically for any open-source LLMs, please check out the source code [for the API server](https://github.com/second-state/llamaedge/tree/main/api-server).
311300
* To learn more, please check out [this article](https://medium.com/stackademic/fast-and-portable-llama2-inference-on-the-heterogeneous-edge-a62508e82359).

0 commit comments

Comments
 (0)