Skip to content

Add Qwen3 0.6B, 1.7B, and 4B #10539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 29, 2025
Merged

Add Qwen3 0.6B, 1.7B, and 4B #10539

merged 9 commits into from
Apr 29, 2025

Conversation

jackzhxng
Copy link
Contributor

@jackzhxng jackzhxng commented Apr 29, 2025

Add ExecuTorch support for Qwen3 0.6B, 1.7B, and 4B

Qwen3 0.6B

Export with xnnpack + 8da4w quantization

python -m examples.models.llama.export_llama --model qwen3-0_6b --params examples/models/qwen3/0_6b_config.json -kv --use_sdpa_with_kv_cache -X --xnnpack-extended-ops -d fp32 --output_name="qwen3-0_6b_x_8da4w.pte" --verbose -qmode 8da4w

Run with pybindings

python -m examples.models.llama.runner.native --model qwen3-0_6b --pte qwen3-0_6b_x_8da4w.pte  --tokenizer ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer.json --tokenizer_config ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer_config.json --prompt "Who is the president of the US?" --params examples/models/qwen3/0_6b_config.json --max_len 128 -kv --temperature 0.6

>> Okay, let's see. The user is asking about the president of the US, but they wrote "And who is the president of the US?" and "And who is the president of the US?" So maybe they are using the same question but in a different way. They might be referring to the same president. Let me check. ...

# Some rough stats
Prefill time: 0.24 s
Generation tok/s: 17.15 s
Memory: 826.68 MB

Qwen3 1.7B

Export with xnnpack + 8da4w quantization

python -m examples.models.llama.export_llama --model qwen3-1_7b --params examples/models/qwen3/1_7b_config.json -kv --use_sdpa_with_kv_cache -X --xnnpack-extended-ops -d fp32 --output_name="qwen3-1_7b_x_8da4w.pte" --verbose -qmode 8da4w

Run with pybindings

python -m examples.models.llama.runner.native --model qwen3-1_7b --pte qwen3-1_7b_x_8da4w.pte  --tokenizer ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer.json --tokenizer_config ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer_config.json --prompt "Who is the president of the US?" --params examples/models/qwen3/1_7b_config.json --max_len 128 -kv --temperature 0.6

# Some rough stats
Prefill time: 0.25 s
Generation tok/s: 16.87 s
Memory: 1.02 GB

Qwen3 4B

Export with xnnpack + 8da4w quantization

python -m examples.models.llama.export_llama --model qwen3-4b --params examples/models/qwen3/4b_config.json -kv --use_sdpa_with_kv_cache -X --xnnpack-extended-ops -d fp32 --output_name="qwen3-4b_x_8da4w.pte" --verbose -qmode 8da4w

Run with pybindings

python -m examples.models.llama.runner.native --model qwen3-4b --pte qwen3-4b_x_8da4w.pte  --tokenizer ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer.json --tokenizer_config ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer_config.json --prompt "Who is the president of the US?" --params examples/models/qwen3/4b_config.json --max_len 128 -kv --temperature 0.6

# Some rough stats
Prefill time: 0.44 s
Generation tok/s: 12.12 s
Memory: 2.5 GB

bypass-github-export-checks

@jackzhxng jackzhxng requested a review from lucylq as a code owner April 29, 2025 03:16
Copy link

pytorch-bot bot commented Apr 29, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10539

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 21 Pending

As of commit 626a0f0 with merge base 32dffbc (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 29, 2025
@jackzhxng jackzhxng added the release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava label Apr 29, 2025
@jackzhxng jackzhxng changed the title Add Qwen3 0.6B Add Qwen3 0.6B, 1.7B, and 4B Apr 29, 2025
@cbilgin
Copy link

cbilgin commented Apr 29, 2025

Is there a copy paste issue or the response to the prompt somehow wrong? Looks like what's vibe coding is the prompt but the response is about US president?

Copy link
Contributor

@tarun292 tarun292 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check with @madhu-fb about the qk_norm changes before landing.

@tarun292
Copy link
Contributor

Is there a copy paste issue or the response to the prompt somehow wrong? Looks like what's vibe coding is the prompt but the response is about US president?

Yea same question, seems like you might not have copied over the full response?

@tarun292
Copy link
Contributor

Please add a README.md for the export flow and also i don't see the config json's for 1.7B and 4B.

@jackzhxng jackzhxng force-pushed the jz/add-qwen3 branch 2 times, most recently from 8412c37 to 8aac481 Compare April 29, 2025 06:41
@mergennachin
Copy link
Contributor

Nice job

Add this to the top-level README - https://github.com/pytorch/executorch/blob/main/README.md?plain=1#L54

The perf benchmarks on desktop is not really representative. As next steps, having instructions on running on iOS and Android phones, and showing the benchmarks would be good. We can publicize the actual screencast on mobile phones more.

@facebook-github-bot
Copy link
Contributor

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

2 similar comments
@facebook-github-bot
Copy link
Contributor

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jackzhxng jackzhxng merged commit 7b86bf1 into main Apr 29, 2025
91 of 92 checks passed
@jackzhxng jackzhxng deleted the jz/add-qwen3 branch April 29, 2025 21:27
@@ -243,14 +244,18 @@ def forward(
k = k.view(bsz, seqlen, self.n_local_kv_heads, self.head_dim)
v = v.view(bsz, seqlen, self.n_local_kv_heads, self.head_dim)

if self.use_qk_norm and self.qk_norm_before_rope:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might've been cleaner to make it qk_norm_mode that can be one of None, BEFORE_ROPE, AFTER_ROPE. don't know if we have backward compatibility constraints preventing that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea i agree, this would be cleaner. @jackzhxng let's follow up with this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants