You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Gated layers[0] have been around for some time and have seen considerable, although not overwhelming use. Llama.cpp doesn't seem to implement gating as an operation however. At very least I wasn't able to find it.
There are also gated activation functions, such as ReGLU and GEGLU[1], which is used in phi-3-small model[2] if llama.cpp were to support these architectures, some form of gating would probably be necessary.
The obvious starting point is implementing the Kroenecker product. This is relatively straightforward and should be easy to parallelize. Although it wouldn't see any use until implementation of the aforementioned activation functions and models using them. The code would be placed into GGML lib. The only problematic aspect would be the backward pass, but that isn't implemented for other ops either.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Gated layers[0] have been around for some time and have seen considerable, although not overwhelming use. Llama.cpp doesn't seem to implement gating as an operation however. At very least I wasn't able to find it.
There are also gated activation functions, such as ReGLU and GEGLU[1], which is used in phi-3-small model[2] if llama.cpp were to support these architectures, some form of gating would probably be necessary.
The obvious starting point is implementing the Kroenecker product. This is relatively straightforward and should be easy to parallelize. Although it wouldn't see any use until implementation of the aforementioned activation functions and models using them. The code would be placed into GGML lib. The only problematic aspect would be the backward pass, but that isn't implemented for other ops either.
[0]https://arxiv.org/abs/1612.08083
[1]https://arxiv.org/abs/2002.05202v1
[2]https://arxiv.org/abs/2404.14219
[3]https://en.wikipedia.org/wiki/Kronecker_product
Beta Was this translation helpful? Give feedback.
All reactions