Clarification on Multi-Vector Embedding Implementation in Milvus #40424
Unanswered
ranjith502
asked this question in
Q&A and General discussion
Replies: 2 comments 7 replies
-
|
Beta Was this translation helpful? Give feedback.
6 replies
-
embedding list support is on our roadmap and hopefully it can be released in the next 2 months. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Team,
I've implemented multi-vector embedding in Milvus using concepts from the following resources:
🔹 ColPali with Milvus
🔹 FastEmbed ColBERT
The embeddings I’m using are generated using the ColBERT model, which produces a multi-vector representation (instead of a single vector per document). I wanted to verify whether my approach is correct and whether there are any optimizations or best practices I should follow.
1️⃣ Dataset & Embeddings:
I have 20 descriptions, and each description is processed using ColBERT, generating 48 embeddings of shape (48, 128).
Since Milvus does not allow inserting (48,128) at once, I insert 48 separate vectors per description.
Final storage: Instead of 20 entries, my collection ends up with 960 (48 × 20) vectors.
2️⃣ Schema Definition & Indexing
client = MilvusClient(uri=f"http://{host}:19530")
Define schema
schema1 = MilvusClient.create_schema()
schema1.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema1.add_field(field_name="doc_id", datatype=DataType.INT64)
schema1.add_field(field_name="descriptions_embeddings", datatype=DataType.FLOAT_VECTOR, dim=128)
schema1.add_field(field_name="descriptions", datatype=DataType.VARCHAR, max_length=10000, nullable=True)
Create collection
client.create_collection(collection_name=collection_name, schema=schema1)
Insert embeddings (flattening multi-vector representation)
data_to_insert = [
{
"id": i * 1000 + j, # Unique ID per vector
"doc_id": i, # Keeps track of document ID
"descriptions_embeddings": vector, # Single 128-dimensional vector
"descriptions": description # Corresponding description
}
for i, (embedding_list, description) in enumerate(zip(descriptions_embeddings, descriptions))
for j, vector in enumerate(embedding_list)
]
client.insert(collection_name=collection_name, data=data_to_insert)
client.flush(collection_name=collection_name)
Create an index
index_params = client.prepare_index_params()
index_params.add_index(field_name='descriptions_embeddings', index_type='IVF_FLAT', metric_type='L2', params={})
client.create_index(collection_name=collection_name, index_params=index_params)
1️⃣ Query Embedding: Using ColBERT, a single query returns 32 embeddings of shape (32, 128).
2️⃣ Search: Each query embedding searches for 50 closest matches, leading to a max of 32 × 50 = 1600 embeddings.
3️⃣ Filter Unique doc_id: After removing duplicates, I retrieve all 48 embeddings per document.
4️⃣ Reranking: I apply MaxSim (ColBERT-style scoring) to compute final relevance scores.
Step 1: Perform Vector Search
results = client.search(
collection_name,
query_vector,
limit=50,
output_fields=["doc_id"], # Retrieve doc_id only
search_params={"metric_type": "L2", "params": {}},
)
Step 2: Extract Unique Document IDs
doc_ids = set()
for res in results:
for match in res:
doc_ids.add(match["entity"]["doc_id"])
Step 3: Fetch Full Document Embeddings and Compute MaxSim Score
def rerank_single_doc(doc_id, query_vector, client, collection_name):
doc_colbert_vecs = client.query(
collection_name=collection_name,
filter=f"doc_id == {doc_id}",
output_fields=["descriptions_embeddings"],
limit=5000,
)
doc_vecs = np.vstack([doc_colbert_vecs[i]["descriptions_embeddings"] for i in range(len(doc_colbert_vecs))])
score = np.dot(query_vector, doc_vecs.T).max(1).sum()
return (score, doc_id)
Reranking using ThreadPool
with concurrent.futures.ThreadPoolExecutor(max_workers=100) as executor:
futures = {executor.submit(rerank_single_doc, doc_id, query_vector, client, collection_name): doc_id for doc_id in doc_ids}
scores = [future.result() for future in concurrent.futures.as_completed(futures)]
Sort and return top results
scores.sort(key=lambda x: x[0], reverse=True)
topk_results = scores[:5]
Output
print(f"✅ Top-K Retrieved Documents:")
for rank, (score, doc_id) in enumerate(topk_results, start=1):
print(f"🏆 Rank {rank}: Document ID {doc_id} with score {score}")
output results:-
✅ Query embedding shape: (32, 128)
✅ Unique document IDs retrieved: {0, 1, 2, 3, 4, 5, 6, 18, 19}
📌 Document 0 retrieved 48 vectors.
📌 Document 18 retrieved 48 vectors.
📌 Document 3 retrieved 48 vectors.
📌 Document 19 retrieved 48 vectors.
📌 Document 6 retrieved 48 vectors.
✅ Top-K retrieved documents:
🏆 Rank 1: Document ID 4 with score 12.06
🏆 Rank 2: Document ID 19 with score 10.75
🏆 Rank 3: Document ID 6 with score 10.04
🏆 Rank 4: Document ID 1 with score 9.64
🏆 Rank 5: Document ID 18 with score 7.40
🎉 Search & Reranking Completed!
🔹 Questions for the Milvus Team
1️⃣ Is my approach of inserting embeddings correct? Since Milvus does not allow (48,128) insertion directly, I flattened the embeddings by inserting 48 separate vectors per document. Is this the recommended way?
2️⃣ Are there any optimizations to reduce storage while keeping multi-vector retrieval performance efficient? Right now, I am storing 48× more entities than the original document count.
3️⃣ Is there a better way to retrieve all embeddings belonging to a document (doc_id) in a more efficient manner? Currently, I use filter="doc_id == {doc_id}" during reranking, which retrieves all 48 vectors per document.
Beta Was this translation helpful? Give feedback.
All reactions