Skip to content

Qwen3 Mismatch #1220

@Helw150

Description

@Helw150

Something is mildly different between our Qwen3 implementation and the VLLM Qwen3 implementation. It's not significant enough that models transferred between the two degenerate completely but its enough to make the model significantly dumber

Plan of Attack

  • Set up a full Qwen Roundtrip test
  • Remove any source of numeric differences between Levanter and HF Qwen3
  • Once the difference is identified, we'll need to submit a PR to VLLM to support whatever our architecture difference is so that people can inference Marin 32B with it.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions