normalize embeddings

**Is your feature request related to a problem?** 

Currently, the embeddings generated by llama.cpp via local-ai are not normalized. For many applications, especially those involving semantic search or vector similarity calculations with cosine similarity, embeddings must be L2 normalized. This forces developers to perform a normalization step on the client-side after receiving the embedding vector from the API. 

**Describe the solution you'd like**

I propose adding a new boolean option to the embeddings model yaml config file, named embd_normalize (equivalent to llama.cpp arg --embd-normalize) that triggers the normalization. Also, I think this option should be the default when making requests to a OpenAI-like endpoint as `/v1/embeddings` but not enabled (by default) in `/embeddings`. This is compatible with OpenAI models (and endpoint) that returns L2 normalized vector embeddings.

When this option is set to true, the llama.cpp server will perform an L2 normalization on the final embedding vector before it is returned in the API response (this is already implemented in recent llama.cpp versions). When the option is false or not present, the server should return the raw, non-normalized embedding only for the endpoint `/embeddings` but normalized for `/v1/embeddings`.

This would allow users to receive ready-to-use, normalized embeddings directly from the API, simplifying client-side logic and improving overall efficiency.

Example model config file:

```yaml
name: qwen3-embedding-4b
embeddings: true
backend: llama-cpp
context_size: 32768
f16: true
mmap: true
parameters:
  model: Qwen3-Embedding-4B-Q8_0.gguf
  embd_normalize: true
```

**Describe alternatives you've considered**

The only alternative at present is to manually normalize the embedding vectors on the client-side. This involves receiving the raw vector from llama.cpp and then implementing a function to calculate the L2 norm and divide each component of the vector by it. While functional, this approach is less efficient and requires every client application developer to reimplement the same logic.

**Additional context**

L2 normalization is a standard procedure for preparing embeddings for many machine learning tasks. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

normalize embeddings #5821

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

normalize embeddings #5821

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions