Why is clip encoding so much slower on a M1 mac compared to Jetson Orin Nano? #13633
Unanswered
marcov-dart
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm using a fresh pull on both the Mac M1 and the Jetson Orin Nano Super. In both cases all layers are offloaded to the GPU.
The inference speed is fairly close with around 16 t/s on the Mac M1 and roughly 14 t/s on the Jetson Orin Nano. Which is more or less in line with my expectations.
But when I try gemma-3-4b-it-qat-q4_0-gguf with llama-mtmd-cli then there is quite a difference when encoding images.
M1:
encoding image or slice...
image/slice encoded in 59080 ms
decoding image batch 1/1, n_tokens_batch = 256
image decoded (batch 1/1) in 930 ms
Jetson Orin Nano Super:
encoding image or slice...
image/slice encoded in 4064 ms
decoding image batch 1/1, n_tokens_batch = 256
image decoded (batch 1/1) in 306 ms
Here the Jetson Orin Nano is nearly 15 times faster. Is there a know reason for this difference?
Beta Was this translation helpful? Give feedback.
All reactions