ggml_repeat no cuda impl #4841
Replies: 2 comments
-
It seems that those repositories have been without updates for quite some time and they use a rather old version of ggml. Many operations now have GPU acceleration implemented, but it requires those projects to implement ggml-backend. |
Beta Was this translation helpful? Give feedback.
-
Yes, exactly! I'm trying to update it, but ggml_repeat still has no gpu backend. It looks like it is a slow way of doing broadcasting. I've got the starts of a "BERTCPP" equivalency, with updated GGML, but I keep running into roadblocks like this that leave me scratching my head as to how to proceed. Edit: Nevermind - I see a cuda repeat option has been added to latest, I need to update my LlamaCPP... |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm seeing a number of embedding models based off of
https://github.com/skeskinen/bert.cpp
Including this
https://github.com/xyzhang626/embeddings.cpp
Which implements BGE. But none of them are GPU accelerated. It seems that they all rely on ggml_repeat which doesn't have a cuda implementation, so we are stuck constantly going back and forth to the GPU if we were to put the other layers on the GPU. Do I have that right?
Beta Was this translation helpful? Give feedback.
All reactions