Add Qwen3 0.6B, 1.7B, and 4B #10539

jackzhxng · 2025-04-29T03:16:59Z

Add ExecuTorch support for Qwen3 0.6B, 1.7B, and 4B

Qwen3 0.6B

Export with xnnpack + 8da4w quantization

python -m examples.models.llama.export_llama --model qwen3-0_6b --params examples/models/qwen3/0_6b_config.json -kv --use_sdpa_with_kv_cache -X --xnnpack-extended-ops -d fp32 --output_name="qwen3-0_6b_x_8da4w.pte" --verbose -qmode 8da4w

Run with pybindings

python -m examples.models.llama.runner.native --model qwen3-0_6b --pte qwen3-0_6b_x_8da4w.pte  --tokenizer ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer.json --tokenizer_config ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer_config.json --prompt "Who is the president of the US?" --params examples/models/qwen3/0_6b_config.json --max_len 128 -kv --temperature 0.6

>> Okay, let's see. The user is asking about the president of the US, but they wrote "And who is the president of the US?" and "And who is the president of the US?" So maybe they are using the same question but in a different way. They might be referring to the same president. Let me check. ...

# Some rough stats
Prefill time: 0.24 s
Generation tok/s: 17.15 s
Memory: 826.68 MB

Qwen3 1.7B

Export with xnnpack + 8da4w quantization

python -m examples.models.llama.export_llama --model qwen3-1_7b --params examples/models/qwen3/1_7b_config.json -kv --use_sdpa_with_kv_cache -X --xnnpack-extended-ops -d fp32 --output_name="qwen3-1_7b_x_8da4w.pte" --verbose -qmode 8da4w

Run with pybindings

python -m examples.models.llama.runner.native --model qwen3-1_7b --pte qwen3-1_7b_x_8da4w.pte  --tokenizer ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer.json --tokenizer_config ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer_config.json --prompt "Who is the president of the US?" --params examples/models/qwen3/1_7b_config.json --max_len 128 -kv --temperature 0.6

# Some rough stats
Prefill time: 0.25 s
Generation tok/s: 16.87 s
Memory: 1.02 GB

Qwen3 4B

Export with xnnpack + 8da4w quantization

python -m examples.models.llama.export_llama --model qwen3-4b --params examples/models/qwen3/4b_config.json -kv --use_sdpa_with_kv_cache -X --xnnpack-extended-ops -d fp32 --output_name="qwen3-4b_x_8da4w.pte" --verbose -qmode 8da4w

Run with pybindings

python -m examples.models.llama.runner.native --model qwen3-4b --pte qwen3-4b_x_8da4w.pte  --tokenizer ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer.json --tokenizer_config ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer_config.json --prompt "Who is the president of the US?" --params examples/models/qwen3/4b_config.json --max_len 128 -kv --temperature 0.6

# Some rough stats
Prefill time: 0.44 s
Generation tok/s: 12.12 s
Memory: 2.5 GB

bypass-github-export-checks

pytorch-bot · 2025-04-29T03:17:02Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10539

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 21 Pending

As of commit 626a0f0 with merge base 32dffbc ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cbilgin · 2025-04-29T05:21:25Z

Is there a copy paste issue or the response to the prompt somehow wrong? Looks like what's vibe coding is the prompt but the response is about US president?

tarun292

Please check with @madhu-fb about the qk_norm changes before landing.

tarun292 · 2025-04-29T06:14:16Z

Is there a copy paste issue or the response to the prompt somehow wrong? Looks like what's vibe coding is the prompt but the response is about US president?

Yea same question, seems like you might not have copied over the full response?

tarun292 · 2025-04-29T06:15:22Z

Please add a README.md for the export flow and also i don't see the config json's for 1.7B and 4B.

mergennachin · 2025-04-29T14:13:39Z

Nice job

Add this to the top-level README - https://github.com/pytorch/executorch/blob/main/README.md?plain=1#L54

The perf benchmarks on desktop is not really representative. As next steps, having instructions on running on iOS and Android phones, and showing the benchmarks would be good. We can publicize the actual screencast on mobile phones more.

facebook-github-bot · 2025-04-29T19:03:11Z

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-04-29T20:27:51Z

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-04-29T20:48:02Z

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-04-29T21:02:05Z

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

swolchok · 2025-04-30T20:53:58Z

examples/models/llama/attention.py

@@ -243,14 +244,18 @@ def forward(
        k = k.view(bsz, seqlen, self.n_local_kv_heads, self.head_dim)
        v = v.view(bsz, seqlen, self.n_local_kv_heads, self.head_dim)

+        if self.use_qk_norm and self.qk_norm_before_rope:


might've been cleaner to make it qk_norm_mode that can be one of None, BEFORE_ROPE, AFTER_ROPE. don't know if we have backward compatibility constraints preventing that

Yea i agree, this would be cleaner. @jackzhxng let's follow up with this change.

Add Qwen3 0.6B

803ff1d

jackzhxng requested a review from lucylq as a code owner April 29, 2025 03:16

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 29, 2025

1.7B and 4B

507dbc0

jackzhxng added the release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava label Apr 29, 2025

jackzhxng changed the title ~~Add Qwen3 0.6B~~ Add Qwen3 0.6B, 1.7B, and 4B Apr 29, 2025

tarun292 approved these changes Apr 29, 2025

View reviewed changes

1.7b and 4b configs

da0d159

jackzhxng force-pushed the jz/add-qwen3 branch 2 times, most recently from 8412c37 to 8aac481 Compare April 29, 2025 06:41

readme

5a02439

jackzhxng force-pushed the jz/add-qwen3 branch from 8aac481 to 5a02439 Compare April 29, 2025 06:42

jackzhxng added 4 commits April 29, 2025 10:15

Update top level readme

17989cc

Merge branch 'main' into jz/add-qwen3

f9b733f

Update readme

fa7ff5d

qk norm before rope arg

c5dba06

Merge branch 'main' into jz/add-qwen3

626a0f0

jackzhxng force-pushed the jz/add-qwen3 branch from 65c8bdd to 626a0f0 Compare April 29, 2025 21:01

jackzhxng merged commit 7b86bf1 into main Apr 29, 2025
91 of 92 checks passed

jackzhxng deleted the jz/add-qwen3 branch April 29, 2025 21:27

swolchok reviewed Apr 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3 0.6B, 1.7B, and 4B #10539

Add Qwen3 0.6B, 1.7B, and 4B #10539

jackzhxng commented Apr 29, 2025 •

edited

Loading

pytorch-bot bot commented Apr 29, 2025 •

edited

Loading

cbilgin commented Apr 29, 2025

tarun292 left a comment

tarun292 commented Apr 29, 2025

tarun292 commented Apr 29, 2025

mergennachin commented Apr 29, 2025

facebook-github-bot commented Apr 29, 2025

facebook-github-bot commented Apr 29, 2025

facebook-github-bot commented Apr 29, 2025

facebook-github-bot commented Apr 29, 2025

swolchok Apr 30, 2025

tarun292 Apr 30, 2025

Add Qwen3 0.6B, 1.7B, and 4B #10539

Add Qwen3 0.6B, 1.7B, and 4B #10539

Conversation

jackzhxng commented Apr 29, 2025 • edited Loading

Qwen3 0.6B

Qwen3 1.7B

Qwen3 4B

pytorch-bot bot commented Apr 29, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10539

⏳ No Failures, 21 Pending

cbilgin commented Apr 29, 2025

tarun292 left a comment

Choose a reason for hiding this comment

tarun292 commented Apr 29, 2025

tarun292 commented Apr 29, 2025

mergennachin commented Apr 29, 2025

facebook-github-bot commented Apr 29, 2025

facebook-github-bot commented Apr 29, 2025

facebook-github-bot commented Apr 29, 2025

facebook-github-bot commented Apr 29, 2025

swolchok Apr 30, 2025

Choose a reason for hiding this comment

tarun292 Apr 30, 2025

Choose a reason for hiding this comment

jackzhxng commented Apr 29, 2025 •

edited

Loading

pytorch-bot bot commented Apr 29, 2025 •

edited

Loading