Replies: 1 comment
-
It may also be worth looking at DeepseekVL2 models which share the same vocabulary as DeepseekV3. This one, maybe then it could be offloaded to the gpu? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
"Based on our evaluation, the acceptance rate of the second token prediction ranges between 85% and 90% across various generation topics, demonstrating consistent reliability. This high acceptance rate enables DeepSeek-V3 to achieve a significantly improved decoding speed, delivering 1.8 times TPS (Tokens Per Second)."
(The DeepSeek v3 report)
Beta Was this translation helpful? Give feedback.
All reactions