Qwen3 Mismatch

Something is mildly different between our Qwen3 implementation and the VLLM Qwen3 implementation. It's not significant enough that models transferred between the two degenerate completely ~but its enough to make the model significantly dumber~


## Plan of Attack

- Set up a full Qwen Roundtrip test
- Remove any source of numeric differences between Levanter and HF Qwen3
- Once the difference is identified, we'll need to submit a PR to VLLM to support whatever our architecture difference is so that people can inference Marin 32B with it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3 Mismatch #1220

Plan of Attack

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3 Mismatch #1220

Description

Plan of Attack

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions