Getting Error while performing Hybrid search using Milvus #35463
Replies: 10 comments 7 replies
-
I just simplified the script, and test with milvus v2.4.6, I didn't encounter the error.
|
Beta Was this translation helpful? Give feedback.
-
what is the result looks like of your dense embeddings and your sparse embedding? print(f"Dense Vector Type: {type(dense_vector)}") Check the format of sparse_vector hereprint(f"Sparse Vector: {sparse_vector}") entity = { from the error log, milvus believe you pass in a floatvector on a sparse vector field |
Beta Was this translation helpful? Give feedback.
-
Although you checked the entities and only inserted valid_entities, the sentence "What does SDA mean?" is converted to an empty sparse embedding by the sparse_embedding_func so that the Milvus returns this error. A workaround is, extend the BM25SparseEmbedding and check it output, something like:
|
Beta Was this translation helpful? Give feedback.
-
There might be some problem when the query output of BM25SparseEmbedding is empty dict {}, I will try to fix it as a work around to solve this problem. you can also subclass the BM25SparseEmbedding just like what yhmo did. Milvus will also fix this issue in a short future. |
Beta Was this translation helpful? Give feedback.
-
Now I am able to do Hybrid search . |
Beta Was this translation helpful? Give feedback.
-
Hi all, Thanks for the clear explanations! I ran with similar issue and the workaround didn't work. RPC error: [hybrid_search], <MilvusException: (code=65535, message=fail to search on QueryNode 2: worker(2) query failed: Assert "static_cast(field_meta.get_data_type()) == static_cast(info.type())" => vector type must be the same, field sparse_vector - type VECTOR_SPARSE_FLOAT, search info type VECTOR_FLOAT at /workspace/source/internal/core/src/query/Plan.cpp:48 The possible cause is that I used BGE M3 model to embed the text as dense and sparse vectors (ref: https://github.com/milvus-io/pymilvus/blob/master/examples/hello_hybrid_sparse_dense.py) |
Beta Was this translation helpful? Give feedback.
-
Hi all, we observed the same issue with hybrid search. I am using Milvus 2.5.0 I am using BM25 function available inbuilt from pymilvus. |
Beta Was this translation helpful? Give feedback.
-
Hi @xiaofan-luan , please find attached search request and logs: |
Beta Was this translation helpful? Give feedback.
-
double checked the client side code, it seems to be all good. |
Beta Was this translation helpful? Give feedback.
-
@xiaofan-luan, I am using Milvus 2.5.0 version. Hybrid search with full text search (BM25 Sparse embedding generation inbuilt) is giving ' QueryNode failed to search' error only for 1 specific type of collection. While works for other collections with same schema. The collection with issue has semi-structured data (tabular data, special charecters like $, plus plain text ) being vectorised while the collections with no issue have unstructured data (pure text). Is BM25 inbuilt function tested for semi-structured data input? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to perform Hybrid search in milvus using the below code :
`from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_milvus.retrievers import MilvusCollectionHybridSearchRetriever
from langchain_milvus.utils.sparse import BM25SparseEmbedding
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
import torch
from dotenv import load_dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from database_operation import initialise_milvus, data_ingest_milvus_db, data_load_from_milvus
import time
from groq import Groq
import os, logging
from langchain_community.document_loaders import UnstructuredPDFLoader
from bs4 import BeautifulSoup
from pathlib import Path
from pymilvus import (
Collection,
CollectionSchema,
DataType,
FieldSchema,
WeightedRanker,
connections,
)
CONNECTION_URI = "http://localhost:19530"
#export OPENAI_API_KEY=<your_api_key>
folder_path = "/home/rndadmin/dev/test/"
def preprocess_document(docs):
table_fragments = {}
data_string = ''
def get_documents(folder_path):
try:
documents = []
def load_and_process_documents():
documents = get_documents(folder_path)
print('Files Loaded')
texts= load_and_process_documents()
print(type(texts))
dense_embedding_func = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
text_contents = [doc.page_content for doc in texts] # Extract text content from each Document
dense_dim = len(dense_embedding_func.embed_query(text_contents[1]))
corpus=str(texts)
print(type(corpus))
sparse_embedding_func = BM25SparseEmbedding(corpus=corpus)
connections.connect(uri=CONNECTION_URI)
pk_field = "doc_id"
dense_field = "dense_vector"
sparse_field = "sparse_vector"
text_field = "text"
fields = [
FieldSchema(
name=pk_field,
dtype=DataType.VARCHAR,
is_primary=True,
auto_id=True,
max_length=100,
),
FieldSchema(name=dense_field, dtype=DataType.FLOAT_VECTOR, dim=dense_dim),
FieldSchema(name=sparse_field, dtype=DataType.SPARSE_FLOAT_VECTOR),
FieldSchema(name=text_field, dtype=DataType.VARCHAR, max_length=65_535),
]
schema = CollectionSchema(fields=fields, enable_dynamic_field=False)
collection = Collection(
name="Galitt_test_hybrid", schema=schema, consistency_level="Strong"
)
dense_index = {"index_type": "FLAT", "metric_type": "IP"}
collection.create_index("dense_vector", dense_index)
sparse_index = {"index_type": "SPARSE_INVERTED_INDEX", "metric_type": "IP"}
collection.create_index("sparse_vector", sparse_index)
collection.flush()
entities = []
for text in text_contents:
dense_vector = dense_embedding_func.embed_documents([text])[0]
sparse_vector = sparse_embedding_func.embed_documents([text])[0]
print(entities)
Filter out entities with empty sparse vectors
valid_entities = [entity for entity in entities if entity[sparse_field]]
print("Valid Entities after excluding empty sparse vectors")
print(valid_entities)
if valid_entities:
collection.insert(valid_entities)
print("Entities insertes in vector DB")
else:
print("No valid entities to insert.")
collection.load()
sparse_search_params = {"metric_type": "IP"}
dense_search_params = {"metric_type": "IP", "params": {}}
retriever = MilvusCollectionHybridSearchRetriever(
collection=collection,
rerank=WeightedRanker(0.5, 0.5),
anns_fields=[dense_field, sparse_field],
field_embeddings=[dense_embedding_func, sparse_embedding_func],
field_search_params=[dense_search_params, sparse_search_params],
top_k=3,
text_field=text_field,
)
result=retriever.invoke("What does SDA mean?")
print(result)
`
My retriever is created sucesfully but when I am doing any query to the retriever , I am getting below error :
**RPC error: [hybrid_search], <MilvusException: (code=65535, message=fail to search on QueryNode 3: worker(3) query failed: Assert "static_cast(field_meta.get_data_type()) == static_cast(info.type())" at /go/src/github.com/milvus-io/milvus/internal/core/src/query/Plan.cpp:48
=> vector type must be the same, field sparse_vector - type VECTOR_SPARSE_FLOAT, search info type VECTOR_FLOAT)>, <Time:{'RPC start': '2024-08-14 04:47:17.316357', 'RPC error': '2024-08-14 04:47:18.708334'}>
Traceback (most recent call last):
File "/home/rndadmin/galitt_dev/./test.py", line 194, in
result=retriever.invoke("What does SDA mean?")
File "/home/rndadmin/galitt_dev/galitt_env/lib/python3.10/site-packages/langchain_core/retrievers.py", line 221, in invoke
raise e
File "/home/rndadmin/galitt_dev/galitt_env/lib/python3.10/site-packages/langchain_core/retrievers.py", line 214, in invoke
result = self._get_relevant_documents(
File "/home/rndadmin/galitt_dev/galitt_env/lib/python3.10/site-packages/langchain_milvus/retrievers/milvus_hybrid_search.py", line 157, in _get_relevant_documents
search_result = self.collection.hybrid_search(
File "/home/rndadmin/galitt_dev/galitt_env/lib/python3.10/site-packages/pymilvus/orm/collection.py", line 943, in hybrid_search
resp = conn.hybrid_search(
File "/home/rndadmin/galitt_dev/galitt_env/lib/python3.10/site-packages/pymilvus/decorators.py", line 147, in handler
raise e from e
File "/home/rndadmin/galitt_dev/galitt_env/lib/python3.10/site-packages/pymilvus/decorators.py", line 143, in handler
return func(*args, **kwargs)
File "/home/rndadmin/galitt_dev/galitt_env/lib/python3.10/site-packages/pymilvus/decorators.py", line 182, in handler
return func(self, *args, **kwargs)
File "/home/rndadmin/galitt_dev/galitt_env/lib/python3.10/site-packages/pymilvus/decorators.py", line 122, in handler
raise e from e
File "/home/rndadmin/galitt_dev/galitt_env/lib/python3.10/site-packages/pymilvus/decorators.py", line 87, in handler
return func(*args, kwargs)
File "/home/rndadmin/galitt_dev/galitt_env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 850, in hybrid_search
return self._execute_hybrid_search(
File "/home/rndadmin/galitt_dev/galitt_env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 761, in _execute_hybrid_search
raise e from e
File "/home/rndadmin/galitt_dev/galitt_env/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 754, in _execute_hybrid_search
check_status(response.status)
File "/home/rndadmin/galitt_dev/galitt_env/lib/python3.10/site-packages/pymilvus/client/utils.py", line 63, in check_status
raise MilvusException(status.code, status.reason, status.error_code)
pymilvus.exceptions.MilvusException: <MilvusException: (code=65535, message=fail to search on QueryNode 3: worker(3) query failed: Assert "static_cast(field_meta.get_data_type()) == static_cast(info.type())" at /go/src/github.com/milvus-io/milvus/internal/core/src/query/Plan.cpp:48
=> vector type must be the same, field sparse_vector - type VECTOR_SPARSE_FLOAT, search info type VECTOR_FLOAT)>
can you tell me a work around for this ?
Beta Was this translation helpful? Give feedback.
All reactions