Clarification on Multi-Vector Embedding Implementation in Milvus #40424

ranjith502 · 2025-03-06T11:07:21Z

ranjith502
Mar 6, 2025

Hi Team,

I've implemented multi-vector embedding in Milvus using concepts from the following resources:
🔹 ColPali with Milvus
🔹 FastEmbed ColBERT

The embeddings I’m using are generated using the ColBERT model, which produces a multi-vector representation (instead of a single vector per document). I wanted to verify whether my approach is correct and whether there are any optimizations or best practices I should follow.

1️⃣ Dataset & Embeddings:

I have 20 descriptions, and each description is processed using ColBERT, generating 48 embeddings of shape (48, 128).
Since Milvus does not allow inserting (48,128) at once, I insert 48 separate vectors per description.
Final storage: Instead of 20 entries, my collection ends up with 960 (48 × 20) vectors.

2️⃣ Schema Definition & Indexing

client = MilvusClient(uri=f"http://{host}:19530")

Define schema

schema1 = MilvusClient.create_schema()
schema1.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema1.add_field(field_name="doc_id", datatype=DataType.INT64)
schema1.add_field(field_name="descriptions_embeddings", datatype=DataType.FLOAT_VECTOR, dim=128)
schema1.add_field(field_name="descriptions", datatype=DataType.VARCHAR, max_length=10000, nullable=True)

Create collection

client.create_collection(collection_name=collection_name, schema=schema1)

Insert embeddings (flattening multi-vector representation)

data_to_insert = [
{
"id": i * 1000 + j, # Unique ID per vector
"doc_id": i, # Keeps track of document ID
"descriptions_embeddings": vector, # Single 128-dimensional vector
"descriptions": description # Corresponding description
}
for i, (embedding_list, description) in enumerate(zip(descriptions_embeddings, descriptions))
for j, vector in enumerate(embedding_list)
]

client.insert(collection_name=collection_name, data=data_to_insert)
client.flush(collection_name=collection_name)

Create an index

index_params = client.prepare_index_params()
index_params.add_index(field_name='descriptions_embeddings', index_type='IVF_FLAT', metric_type='L2', params={})
client.create_index(collection_name=collection_name, index_params=index_params)

1️⃣ Query Embedding: Using ColBERT, a single query returns 32 embeddings of shape (32, 128).
2️⃣ Search: Each query embedding searches for 50 closest matches, leading to a max of 32 × 50 = 1600 embeddings.
3️⃣ Filter Unique doc_id: After removing duplicates, I retrieve all 48 embeddings per document.
4️⃣ Reranking: I apply MaxSim (ColBERT-style scoring) to compute final relevance scores.

Step 1: Perform Vector Search

results = client.search(
collection_name,
query_vector,
limit=50,
output_fields=["doc_id"], # Retrieve doc_id only
search_params={"metric_type": "L2", "params": {}},
)

Step 2: Extract Unique Document IDs

doc_ids = set()
for res in results:
for match in res:
doc_ids.add(match["entity"]["doc_id"])

Step 3: Fetch Full Document Embeddings and Compute MaxSim Score

def rerank_single_doc(doc_id, query_vector, client, collection_name):
doc_colbert_vecs = client.query(
collection_name=collection_name,
filter=f"doc_id == {doc_id}",
output_fields=["descriptions_embeddings"],
limit=5000,
)
doc_vecs = np.vstack([doc_colbert_vecs[i]["descriptions_embeddings"] for i in range(len(doc_colbert_vecs))])
score = np.dot(query_vector, doc_vecs.T).max(1).sum()
return (score, doc_id)

Reranking using ThreadPool

with concurrent.futures.ThreadPoolExecutor(max_workers=100) as executor:
futures = {executor.submit(rerank_single_doc, doc_id, query_vector, client, collection_name): doc_id for doc_id in doc_ids}
scores = [future.result() for future in concurrent.futures.as_completed(futures)]

Sort and return top results

scores.sort(key=lambda x: x[0], reverse=True)
topk_results = scores[:5]

Output

print(f"✅ Top-K Retrieved Documents:")
for rank, (score, doc_id) in enumerate(topk_results, start=1):
print(f"🏆 Rank {rank}: Document ID {doc_id} with score {score}")

output results:-

✅ Query embedding shape: (32, 128)
✅ Unique document IDs retrieved: {0, 1, 2, 3, 4, 5, 6, 18, 19}
📌 Document 0 retrieved 48 vectors.
📌 Document 18 retrieved 48 vectors.
📌 Document 3 retrieved 48 vectors.
📌 Document 19 retrieved 48 vectors.
📌 Document 6 retrieved 48 vectors.
✅ Top-K retrieved documents:
🏆 Rank 1: Document ID 4 with score 12.06
🏆 Rank 2: Document ID 19 with score 10.75
🏆 Rank 3: Document ID 6 with score 10.04
🏆 Rank 4: Document ID 1 with score 9.64
🏆 Rank 5: Document ID 18 with score 7.40
🎉 Search & Reranking Completed!

🔹 Questions for the Milvus Team

1️⃣ Is my approach of inserting embeddings correct? Since Milvus does not allow (48,128) insertion directly, I flattened the embeddings by inserting 48 separate vectors per document. Is this the recommended way?

2️⃣ Are there any optimizations to reduce storage while keeping multi-vector retrieval performance efficient? Right now, I am storing 48× more entities than the original document count.

3️⃣ Is there a better way to retrieve all embeddings belonging to a document (doc_id) in a more efficient manner? Currently, I use filter="doc_id == {doc_id}" during reranking, which retrieves all 48 vectors per document.

yhmo · 2025-03-07T03:16:11Z

yhmo
Mar 7, 2025
Collaborator

Currently, milvus doesn't support multivector per record. So, it is the only way to flatten the embeddings by inserting 48 records for each document. We do have plan to support multivector, the target version is v3.0
The field "descriptions" is a long-length string. For each document, 48 records, have the same description, right? I think you can store the descriptions outside milvus, just keep a link between the doc_id and the descriptions.
It is no problem to retrieve embeddings of a document by filter="doc_id=={doc_id}".

6 replies

ranjith502 Mar 10, 2025
Author

@yhmo

any update on above questions

yhmo Mar 10, 2025
Collaborator

The recall rate may be different from the real ColBERT score.
In this workaround, you insert the vectors in flatten, and search the topk=50 items to get several unique doc_id as the input of computing the MaxSim score. Actually the process you need is: use the query_vector to compute with each group(48 vectors per group) to find out the most relevant group.
In the workaround, maybe the most relevant group is not in the doc_id list fetched from the topk=50 items.

ranjith502 Mar 10, 2025
Author

any reference code to implement and compare the results

what is the ans for below question:-

is there is any drop in accuracy by inserting vectors like above ( flatten ) instead of complete multi vector(48,128). if yes then what is recommended way

yhmo Mar 10, 2025
Collaborator

Since milvus doesn't support multivector, we don't have any tests to compare the results.
I guess the accuracy can be better with a larger topk value. A larger topk will get more unique doc_id, and more documents will be computed, hence reduce the risk of losing the real result. On the other hand, the latency also increases. It is a tradeoff I think.

ranjith502 Mar 10, 2025
Author

thanks all

xiaofan-luan · 2025-03-07T08:19:00Z

xiaofan-luan
Mar 7, 2025
Maintainer

embedding list support is on our roadmap and hopefully it can be released in the next 2 months.
Unfortunately the current way to support multi vector is a little bit hacky,.I guess what you did is correct but it's just not that performant.
We will deliver the feature asap

1 reply

ranjith502 Mar 7, 2025
Author

waiting for the feature to try it out

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification on Multi-Vector Embedding Implementation in Milvus #40424

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Clarification on Multi-Vector Embedding Implementation in Milvus #40424

Uh oh!

Uh oh!

ranjith502 Mar 6, 2025

Define schema

Create collection

Insert embeddings (flattening multi-vector representation)

Create an index

Step 1: Perform Vector Search

Step 2: Extract Unique Document IDs

Step 3: Fetch Full Document Embeddings and Compute MaxSim Score

Reranking using ThreadPool

Sort and return top results

Output

Replies: 2 comments · 7 replies

Uh oh!

yhmo Mar 7, 2025 Collaborator

Uh oh!

ranjith502 Mar 10, 2025 Author

Uh oh!

yhmo Mar 10, 2025 Collaborator

Uh oh!

ranjith502 Mar 10, 2025 Author

Uh oh!

yhmo Mar 10, 2025 Collaborator

Uh oh!

ranjith502 Mar 10, 2025 Author

Uh oh!

xiaofan-luan Mar 7, 2025 Maintainer

Uh oh!

ranjith502 Mar 7, 2025 Author

ranjith502
Mar 6, 2025

Replies: 2 comments 7 replies

yhmo
Mar 7, 2025
Collaborator

ranjith502 Mar 10, 2025
Author

yhmo Mar 10, 2025
Collaborator

ranjith502 Mar 10, 2025
Author

yhmo Mar 10, 2025
Collaborator

ranjith502 Mar 10, 2025
Author

xiaofan-luan
Mar 7, 2025
Maintainer

ranjith502 Mar 7, 2025
Author