feat: implement gemma3n text model in MLXLLM #346

xlab · 2025-07-02T08:51:58Z

Implementation of Gemma 3n model for MLXLLM, text only. Based on the reference implementation in mlx-lm:
ml-explore/mlx-lm#258

This code can actually help building the VLM version there #340
cc @DePasqualeOrg

Models

The original MLX weights from mlx-vlm are not supported, only weights converted by mlx-lm are supported.

I've made a new collection with Text Only MLX models, i.e. bf16 and 4bit quantized using this new support.
https://huggingface.co/collections/mlx-community/gemma-3n-text-only-lm-6861cf66ddc9a13102996308

Naive benchmarks

Apple M4 Max

Model	Peak Memory	Generation Speed	Generation Time
`mlx-community/gemma-3n-E4B-it-lm-bf16`	13097M	39.983666 tokens/s	7.503064s
`mlx-community/gemma-3n-E2B-it-lm-bf16`	8535M	63.440184 tokens/s	4.728864s
`mlx-community/gemma-3n-E4B-it-lm-4bit`	3684M	81.048619 tokens/s	3.701482s
`mlx-community/gemma-3n-E2B-it-lm-4bit`	2391M	113.438366 tokens/s	2.644608s

iPhone 16 Pro

Model	Generation Speed
`mlx-community/gemma-3n-E4B-it-lm-4bit`	10-15 tokens/s
`mlx-community/gemma-3n-E2B-it-lm-4bit`	25-30 tokens/s

Notes

Some operations can be compiled (e.g. gelu_topk, logit_softcap) to improve performane
RMSNoScale can be improved when MLXFast.rmsNorm is fixed (allows nil weights)

Misc

added to LLMModelFactory
added to MLXService
added to MLXChatExample
4 model references from HF

Demos

* added to LLMModelFactory * added to MLXService * added to MLXChatExample * 4 model references from HF

DePasqualeOrg · 2025-07-02T09:04:38Z

Nice! Did you base this on #340, or did you start from scratch based on the Python implementation?

xlab · 2025-07-02T15:10:25Z

Hey, good question. This implementation is like 3rd attempt and it is made from scratch based on the python source from mlx-lm.

#340 was a great inspiration, since I am new to this, but sometimes it was misleading. Also, in my initial attempt I was using mlx-vlm language but it also wasn't a good reference. It all worked out once mlx-lm reference was ready.

The key to a successful transpilation is to prompt it piece by piece and verify, also feeding this at the end a re-verifying the whole thing https://swiftpackageindex.com/ml-explore/mlx-swift/main/documentation/mlx/converting-python

xlab and others added 2 commits July 2, 2025 04:01

feat: implement gemma3n text model in MLXLLM

22eb13c

* added to LLMModelFactory * added to MLXService * added to MLXChatExample * 4 model references from HF

Merge branch 'main' into gemma3n-lm

410a18d

fix: resolved issue with RMSNorm using MLXArray.mlxNone

c03e5fd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement gemma3n text model in MLXLLM #346

feat: implement gemma3n text model in MLXLLM #346

Uh oh!

xlab commented Jul 2, 2025 •

edited

Loading

Uh oh!

DePasqualeOrg commented Jul 2, 2025

Uh oh!

xlab commented Jul 2, 2025

Uh oh!

Uh oh!

feat: implement gemma3n text model in MLXLLM #346

Are you sure you want to change the base?

feat: implement gemma3n text model in MLXLLM #346

Uh oh!

Conversation

xlab commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Models

Naive benchmarks

Apple M4 Max

iPhone 16 Pro

Notes

Misc

Demos

Uh oh!

DePasqualeOrg commented Jul 2, 2025

Uh oh!

xlab commented Jul 2, 2025

Uh oh!

Uh oh!

xlab commented Jul 2, 2025 •

edited

Loading