ggml_repeat no cuda impl #4841

SpaceCowboy850 · 2024-01-09T15:18:57Z

SpaceCowboy850
Jan 9, 2024

I'm seeing a number of embedding models based off of

Including this

https://github.com/xyzhang626/embeddings.cpp

Which implements BGE. But none of them are GPU accelerated. It seems that they all rely on ggml_repeat which doesn't have a cuda implementation, so we are stuck constantly going back and forth to the GPU if we were to put the other layers on the GPU. Do I have that right?

FSSRepo · 2024-01-09T16:32:26Z

FSSRepo
Jan 9, 2024
Collaborator

It seems that those repositories have been without updates for quite some time and they use a rather old version of ggml. Many operations now have GPU acceleration implemented, but it requires those projects to implement ggml-backend.

0 replies

SpaceCowboy850 · 2024-01-09T16:39:36Z

SpaceCowboy850
Jan 9, 2024
Author

Yes, exactly! I'm trying to update it, but ggml_repeat still has no gpu backend. It looks like it is a slow way of doing broadcasting. I've got the starts of a "BERTCPP" equivalency, with updated GGML, but I keep running into roadblocks like this that leave me scratching my head as to how to proceed.

Edit: Nevermind - I see a cuda repeat option has been added to latest, I need to update my LlamaCPP...

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml_repeat no cuda impl #4841

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

ggml_repeat no cuda impl #4841

Uh oh!

SpaceCowboy850 Jan 9, 2024

Replies: 2 comments

Uh oh!

FSSRepo Jan 9, 2024 Collaborator

Uh oh!

Uh oh!

SpaceCowboy850 Jan 9, 2024 Author

SpaceCowboy850
Jan 9, 2024

FSSRepo
Jan 9, 2024
Collaborator

SpaceCowboy850
Jan 9, 2024
Author