Google's new model: recurrent-Gemma (Griffin architecture) - faster inference and lower memory usage when inferencing over long contexts #6605
Closed
joseph777111
started this conversation in
General
Replies: 1 comment
-
Already requested here: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
@ggerganov Google just dropped a new model architecture: Griffin, which outperforms transformers. And, it offers efficiency advantages including faster inference and lower memory usage when inferencing over longer contexts. google/gemma.cpp has implemented a C++ version of the model (for use and reference).
Paper: Google’s Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
recurrentgemma-2b-it
Beta Was this translation helpful? Give feedback.
All reactions