Why is clip encoding so much slower on a M1 mac compared to Jetson Orin Nano? #13633

marcov-dart · 2025-05-19T11:43:14Z

marcov-dart
May 19, 2025

I'm using a fresh pull on both the Mac M1 and the Jetson Orin Nano Super. In both cases all layers are offloaded to the GPU.
The inference speed is fairly close with around 16 t/s on the Mac M1 and roughly 14 t/s on the Jetson Orin Nano. Which is more or less in line with my expectations.
But when I try gemma-3-4b-it-qat-q4_0-gguf with llama-mtmd-cli then there is quite a difference when encoding images.

M1:

encoding image or slice...
image/slice encoded in 59080 ms
decoding image batch 1/1, n_tokens_batch = 256
image decoded (batch 1/1) in 930 ms

Jetson Orin Nano Super:

encoding image or slice...
image/slice encoded in 4064 ms
decoding image batch 1/1, n_tokens_batch = 256
image decoded (batch 1/1) in 306 ms

Here the Jetson Orin Nano is nearly 15 times faster. Is there a know reason for this difference?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why is clip encoding so much slower on a M1 mac compared to Jetson Orin Nano? #13633

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Why is clip encoding so much slower on a M1 mac compared to Jetson Orin Nano? #13633

Uh oh!

marcov-dart May 19, 2025

Replies: 0 comments

marcov-dart
May 19, 2025