Skip to content

Commit 8aac481

Browse files
committed
readme
1 parent da0d159 commit 8aac481

File tree

1 file changed

+82
-0
lines changed

1 file changed

+82
-0
lines changed

examples/models/qwen3/README.md

+82
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
## Summary
2+
Qwen 3 is the latest iteration of the Qwen series of large language models (LLMs) developed by Alibaba. Edge-sized Qwen3 model variations (0.6B, 1.7B, and 4B) are currently supported .
3+
4+
## Instructions
5+
6+
Qwen 3 uses the same example code as our optimized Llama model, while the checkpoint, model params, and tokenizer are different. Please see the [Llama README page](../llama/README.md) for details.
7+
8+
All commands for exporting and running Llama on various backends should also be applicable to Qwen 3, by swapping the following args:
9+
```
10+
--model [qwen3-0.6b,qwen3-1_7b,qwen3-4b]
11+
--params [examples/models/qwen3/0_6b_config.json,examples/models/qwen3/1_7b_config.json,examples/models/qwen3/4b_config.json]
12+
```
13+
14+
### Example export
15+
Here is a basic example for exporting Qwen 3, although please refer to [Llama README page](../llama/README.md) for more advanced usage.
16+
17+
Export 0.6b to XNNPack, quantized with 8da4w:
18+
```
19+
python -m examples.models.llama.export_llama \
20+
--model qwen3-0_6b \
21+
--params examples/models/qwen3/0_6b_config.json \
22+
-kv \
23+
--use_sdpa_with_kv_cache \
24+
-d fp32 \
25+
-X \
26+
--xnnpack-extended-ops \
27+
--output_name="qwen3-0_6b.pte" \
28+
--verbose
29+
```
30+
31+
Export 1.7b to XNNPack, quantized with 8da4w:
32+
```
33+
python -m examples.models.llama.export_llama \
34+
--model qwen3-1_7b \
35+
--params examples/models/qwen3/1_7b_config.json \
36+
-kv \
37+
--use_sdpa_with_kv_cache \
38+
-d fp32 \
39+
-X \
40+
--xnnpack-extended-ops \
41+
--output_name="qwen3-1_7b.pte" \
42+
--verbose
43+
```
44+
45+
Export 4b to XNNPack, quantized with 8da4w:
46+
```
47+
python -m examples.models.llama.export_llama \
48+
--model qwen3-4b \
49+
--params examples/models/qwen3/4b_config.json \
50+
-kv \
51+
--use_sdpa_with_kv_cache \
52+
-d fp32 \
53+
-X \
54+
--xnnpack-extended-ops \
55+
--output_name="qwen3-4b.pte" \
56+
--verbose
57+
```
58+
59+
### Example run
60+
With ExecuTorch pybindings:
61+
```
62+
python -m examples.models.llama.runner.native
63+
--model qwen3-0_6b \
64+
--pte qwen3-0_6b.pte \
65+
--tokenizer ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer.json \
66+
--tokenizer_config ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer_config.json \
67+
--prompt "Who is the president of the US?" \
68+
--params examples/models/qwen3/0_6b_config.json \
69+
--max_len 128 \
70+
-kv \
71+
--temperature 0.6
72+
```
73+
74+
With ExecuTorch's sample c++ runner (see the Llama README's [Step 3: Run on your computer to validate](../llama/README.md#step-3-run-on-your-computer-to-validate) to build the runner):
75+
```
76+
cmake-out/examples/models/llama/llama_main
77+
--model_path qwen3-0_6b.pte
78+
--tokenizer_path ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer.json
79+
--prompt="Who is the president of the US?"
80+
```
81+
82+
To run the model on an example iOS or Android app, see the Llama README's [Step 5: Build Mobile apps](../llama/README.md#step-5-build-mobile-apps) section.

0 commit comments

Comments
 (0)