Support for intfloat/e5-mistral-7b-instruct #4863

nathanpbell · 2024-01-10T18:04:28Z

nathanpbell
Jan 10, 2024

In trying to wrap my head around this, I think I've found that llama.cpp would need to support two new features to get this embedding model to work optimally:

we need away to probe the values at the last layer, before the LM head (or ideally skip the LM head all together). Does the current embedding endpoint do exactly that? I couldn't fully follow where it grabs its values from
We need a way to pass in an attention mask along with the batch of inputs, or calculate one.

Before I explore this more, am I on the right path here that a) llama doesn't currently have these features and b) they are needed and c) they are in theory sufficient (or close to it) to get e5-mistral to work as intended.

ggerganov · 2024-01-10T18:35:42Z

ggerganov
Jan 10, 2024
Maintainer

Yes, llama_get_embeddings() will give you the activations before the LM head. See the emebedding example
What kind of a masks specifically are needed? llama.cpp support attention masks via the llama_batch parameters - see explanation here: Support for LLMLingua #4823 (comment)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for intfloat/e5-mistral-7b-instruct #4863

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Support for intfloat/e5-mistral-7b-instruct #4863

Uh oh!

Uh oh!

nathanpbell Jan 10, 2024

Replies: 1 comment

Uh oh!

ggerganov Jan 10, 2024 Maintainer

nathanpbell
Jan 10, 2024

ggerganov
Jan 10, 2024
Maintainer