FLAT Index in Milvus Producing Low Similarity Scores and Unrelated Top-K Results #40743

Bhagyashreet20 · 2025-03-18T20:31:46Z

Bhagyashreet20
Mar 18, 2025

I'm using Milvus to store 2.4M Wikipedia document chunks with OpenAI embeddings (text-embedding-3-small) to power a Retrieval-Augmented Generation (RAG) system. However, I am encountering unexpected retrieval issues:

Irrelevant Top-K Results: Retrieved document chunks often do not relate to the query, even though the database contains the exact query source.
Low Similarity Scores: The similarity scores are too low (~0.05 - 0.1), which is unexpected given that:
- Using COSINE similarity, I expected scores closer to 1 for highly relevant matches.
- Using L2, I expected lower distances for better matches.
FLAT Index Not Returning Exact Matches: Since FLAT performs exhaustive nearest neighbor search, the results should contain the exact query source chunk, yet they do not.

Setup Details

Milvus Version: 2.5.4
Embedding Model: text-embedding-3-small
Embedding Dimension: 1536
Chunk Sizes Tried: 256, 1024
Chunk Overlap: 50
Total Documents: 2462190
Milvus Index Used: FLAT, HNSW
Metric Types Tried: COSINE and L2
TopK : 5,10,20,100, 500
Database Fields:
- url_id (Primary Key)
- embedding (FLOAT_VECTOR, dim=1536)
- text (VARCHAR, max_length=5000)

Example

I ran multiple queries extracted from stored Wikipedia chunks, expecting at least top_k=5 relevant documents to match the query source. However, the retrieved results are completely unrelated.

#Example 1
Query: "What television show did Rauch regularly contribute to?"
Expected: The top results should reference Melissa Rauch, who contributed to Best Week Ever.
Observed Results:

None of the top 10 documents are relevant.
Top documents contain random excerpts from historical texts, military records, and foreign-language documents.
Similarity scores are too low (range: 0.09 - 0.1).

(Full JSON results attached below for reference.)

Attached scripts:

create_doc_store.txt- this file operates in 3 modes

Read: reads the documents, chunks it and generates the embedding
Store: creates collection and stores the embedding, text and url_id in the milvus DB. Additionally, it creates FLAT index with the mentioned metrics
Index: for testing different indexing algorithm

RAG.txt - This file reads the queries and generates embedding and retrieves topk documents from db to generate LLM answer

Questions for the Milvus Team

Why are similarity scores so low (~0.05) even after trying many configuration options? Shouldn’t similar vectors have scores close to 1?
Why does FLAT fail to return exact matches?
- Should I expect FLAT to always retrieve the original query chunk?
Could this be an indexing/storage issue or any other issues?
- Is there a known issue when handling large datasets (2.4M documents)?
If a query has approximately 100 related document chunks (extracted from the same source document as the query) within a pool of 2.4M documents, would you expect these 100 related chunks to appear in the top-k results when setting top_k=10 or top_k=20? If not, what factors could be causing their exclusion from the top-k results?
results.json

Any guidance on why this issue is happening and possible solutions would be greatly appreciated!

xiaofan-luan · 2025-03-18T21:51:52Z

xiaofan-luan
Mar 18, 2025
Maintainer

Hi @Bhagyashreet20

Thanks for the details you offered. Quickly go through your code, it seems that the code is well orgnaized and I didn't find bugs.

to debug:

using attu and read some sample data, or you can read iterator and iterate through all the data in your milvus, see if the data meets your exceptions. I guess this is a data related bug but I don't catch it.
if you already know the answer chunk for you queries, you can calculate the target distance using

import numpy as np

def get_embedding(text: str, model="text-embedding-ada-002") -> np.ndarray:
    """Get embedding for a given text using OpenAI's API."""
    response = openai.Embedding.create(input=text, model=model)
    return np.array(response["data"][0]["embedding"], dtype=np.float32)

def cosine_similarity(vec1: np.ndarray, vec2: np.ndarray) -> float:
    """Compute the cosine similarity between two vectors."""
    dot_product = np.dot(vec1, vec2)
    norm_vec1 = np.linalg.norm(vec1)
    norm_vec2 = np.linalg.norm(vec2)
    
    if norm_vec1 == 0 or norm_vec2 == 0:
        return 0.0  # Avoid division by zero
    
    return dot_product / (norm_vec1 * norm_vec2)

# Example usage:
data_chunk_text = "This is an example sentence for embedding."
query_text = "This is another sentence for testing similarity."

data_chunk = get_embedding(data_chunk_text)
query = get_embedding(query_text)

similarity = cosine_similarity(data_chunk, query)
print(f"Cosine Similarity: {similarity}")

if you use FLAT index, you should always get the same embedding since the cosine similarity will be 1.0
there seems to be no known issue here. Even if you can't find the top1 answer(which is possible due to the ann nature), most cases you should find very similar answers.
If a query has approximately 100 related document chunks (extracted from the same source document as the query) within a pool of 2.4M documents, would you expect these 100 related chunks to appear in the top-k results when setting top_k=10 or top_k=20? If not, what factors could be causing their exclusion from the top-k results?
the answer is definitely yes. the reason it might not be in top 10 is various, including:
ANN index loss - but usually you can get 98-99 recall at top 10
Quantazition loss - only if you use PQ/SQ index
Consistency - for new inserted data
Embedding model think there are not similar - you can verify using the code I offered above

0 replies

yhmo · 2025-03-19T03:45:46Z

yhmo
Mar 19, 2025
Collaborator

Possible reason: the primary field "url_id" has duplicate ids.

4 replies

Bhagyashreet20 Mar 19, 2025
Author

I made sure to generate unique ids for every document chunk

xiaofan-luan Mar 19, 2025
Maintainer

I made sure to generate unique ids for every document chunk

what does your data looks like in mivlus? do you have attu to take a look into it?

Bhagyashreet20 Mar 21, 2025
Author

I don't have attu installed. What do you mean by how data looks like? Can i provide snapshot of data using attu?

yhmo Mar 21, 2025
Collaborator

Attu is a web-based administration tool for milvus. You can follow the installation guides to install it: https://github.com/zilliztech/attu

yhmo · 2025-03-19T07:15:47Z

yhmo
Mar 19, 2025
Collaborator

Verify the embedding data by the following steps:

get embedding of this text from OpenAI

text = "fuldm\u00e6gtig  1030\n- Morten  Staun,  lektor  1267\n- Niels  Alexander,  professor  32, 1451,  \n1550\n- Niels Ebbestrup,  Kolind  758\n- Niels  Julius,  ambassad\u00f8r  30, 522\n- Niels  Ove, professor*  ved K\u00f8benhavns\nUniversitet  30, 1550\n- Nina, fuldm\u00e6gtig  118\n- Ole Emil, orlogskaptajn  691\n- Otto, kontorchef  32\n- Otto Thorvald,  handelschef*  32\n- Per, konsul  32, 522\n- Per Ole Friis, baneomr\u00e5dechef  32\n- Per Wilfred,  major  681\n- Peter  Hansen,  sektionschef  731\n- Poul, konsulent  143, 1175Lassen,  Poul,"

response = client.embeddings.create(input=text, model="text-embedding-3-small")
print(response)

get embedding of the text from milvus

uid = "Prince Claus of the Netherlands_para-11_url-2_chunk15124"
res = client.query(collection_name=collection_name,
             filter=f'url_id == {uid}',
             limit=1,
             output_fields=["embedding"])
print(res)

compare the two embeddings, are they equal?

3 replies

Bhagyashreet20 Mar 21, 2025
Author

I tried comparing the embeddings they are not the same. However, their raw text remain the same.

xiaofan-luan Mar 21, 2025
Maintainer

I tried comparing the embeddings they are not the same. However, their raw text remain the same.

what do you mean by embeedings are not the same but text remain the same?

which means embedding function is wrong?

Bhagyashreet20 Mar 27, 2025
Author

Yes, turns out I was generating the document embeddings through llama-index which in turn calls the OpenAI endpoint. For query embeddings, I generated the embeddings directly using OpenAI endpoint. Due to the difference in the embedding generation method, I couldn't retrieve the relevant results to the given query.

I was able to resolve the error. Thanks!

FLAT Index in Milvus Producing Low Similarity Scores and Unrelated Top-K Results #40743

Uh oh!

Uh oh!

Bhagyashreet20 Mar 18, 2025

Replies: 3 comments · 7 replies

Uh oh!

xiaofan-luan Mar 18, 2025 Maintainer

Uh oh!

yhmo Mar 19, 2025 Collaborator

Uh oh!

Bhagyashreet20 Mar 19, 2025 Author

Uh oh!

xiaofan-luan Mar 19, 2025 Maintainer

Uh oh!

Bhagyashreet20 Mar 21, 2025 Author

Uh oh!

Uh oh!

yhmo Mar 21, 2025 Collaborator

Uh oh!

yhmo Mar 19, 2025 Collaborator

Uh oh!

Bhagyashreet20 Mar 21, 2025 Author

Uh oh!

xiaofan-luan Mar 21, 2025 Maintainer

Uh oh!

Bhagyashreet20 Mar 27, 2025 Author

Bhagyashreet20
Mar 18, 2025

Replies: 3 comments 7 replies

xiaofan-luan
Mar 18, 2025
Maintainer

yhmo
Mar 19, 2025
Collaborator

Bhagyashreet20 Mar 19, 2025
Author

xiaofan-luan Mar 19, 2025
Maintainer

Bhagyashreet20 Mar 21, 2025
Author

yhmo Mar 21, 2025
Collaborator

yhmo
Mar 19, 2025
Collaborator

Bhagyashreet20 Mar 21, 2025
Author

xiaofan-luan Mar 21, 2025
Maintainer

Bhagyashreet20 Mar 27, 2025
Author