Google's new model: recurrent-Gemma (Griffin architecture) - faster inference and lower memory usage when inferencing over long contexts #6605

joseph777111 · 2024-04-11T09:42:04Z

joseph777111
Apr 11, 2024

@ggerganov Google just dropped a new model architecture: Griffin, which outperforms transformers. And, it offers efficiency advantages including faster inference and lower memory usage when inferencing over longer contexts. google/gemma.cpp has implemented a C++ version of the model (for use and reference).

Paper: Google’s Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

recurrentgemma-2b-it

phymbert · 2024-04-11T09:45:06Z

phymbert
Apr 11, 2024
Collaborator

Already requested here:

Support for RecurrentGemma (Gemma with Griffin Architecture) #6564

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Google's new model: recurrent-Gemma (Griffin architecture) - faster inference and lower memory usage when inferencing over long contexts #6605

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Google's new model: recurrent-Gemma (Griffin architecture) - faster inference and lower memory usage when inferencing over long contexts #6605

Uh oh!

Uh oh!

joseph777111 Apr 11, 2024

Replies: 1 comment

Uh oh!

phymbert Apr 11, 2024 Collaborator

joseph777111
Apr 11, 2024

phymbert
Apr 11, 2024
Collaborator