混合搜索时过滤表达式字符串中含有NBSP字符时出错,显示string field contains invalid UTF-8 #40996
Unanswered
IeohMingChan
asked this question in
Q&A and General discussion
Replies: 2 comments 3 replies
-
Beta Was this translation helpful? Give feedback.
1 reply
-
"string field contains invalid UTF-8" 这个报错像是grpc抛出来的,应该是那个expr有问题。这个hybrid_search请求恐怕还没被server端接收就在grpc层面报错返回了。 在调用async_hybrid_search_with_score()的前面一行试着对expr调用encode()打印出来看看:
|
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
milvus版本:v2.5.6 集群版

问题描述:在数据中的一个标量字段source表示文件名,而某个文件名里含有NBSP空格,在正常显示时也表现为空格,在pycharm中打开时显示为NBSP字样,如图:
当使用混合搜索时,在expr表达式中包含了该文件名以进行过滤,执行搜索时发生报错,显示为:string field contains invalid UTF-8 。但是,文件名在保存时已经通过代码强制转换为UTF-8格式,而事实上NBSP空格也属于UTF-8字符。
具体代码如下:
def async_hybrid_search_with_score(self,
embedding: List[float],
embedding_bm25: List[float],
k: int = 4,
weight_reranker: WeightedRanker = RRFRanker(k=60),
dense_search_params: Optional[dict] = None,
bm25_search_params: Optional[dict] = None,
expr: Optional[str] = None,
timeout: Optional[int] = None,
score_threshold: Optional[float] = None,
**kwargs: Any,
) -> List[Tuple[Document, float]]:
ansyc_param = {"_async": True}
merged_params = ansyc_param.copy() # 创建 async_param 的副本
merged_params.update(kwargs)
if bm25_search_params is None:
bm25_search_params = {
"metric_type": "IP",
"params": {"nprobe": 10}
}
request_bm25 = AnnSearchRequest(
data=embedding_bm25,
anns_field=self._vector_field_bm25,
param=bm25_search_params,
limit=k,
expr=expr,
)
logger.info(f"BM25 search request created: {request_bm25}")
if dense_search_params is None:
dense_search_params = self.default_search_params["HNSW"]
request_dense = AnnSearchRequest(
data=[embedding],
anns_field=self._vector_field,
param=dense_search_params,
limit=k,
expr=expr,
)
logger.info(f"dense search request created: {request_dense}")
output_fields = self._get_output_fields()
# 执行混合搜索
future = self.col.hybrid_search([request_bm25, request_dense], weight_reranker,
output_fields=output_fields, limit=k, timeout=timeout, **merged_params)
result = future.result()
logger.info(f"result is {result}")
发生报错如下:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 412, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call
await super().call(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/starlette/applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call
raise exc
File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in call
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 758, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 778, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 79, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
response = await func(request)
File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/srdkb-backend/knowledge_base/kb_api.py", line 100, in search_docs
retrieved_docs = await kb_service.async_do_search(query, user_id=user_id, req_id=req_id,
File "/srdkb-backend/knowledge_base/milvus_kb_service.py", line 194, in async_do_search
slice_combined_docs.extend(self.query_subkb(embedding_data, 1, top_k, score_threshold, included_files,
File "/srdkb-backend/knowledge_base/milvus_kb_service.py", line 301, in query_subkb
docs_and_distances = search_fun(
File "/srdkb-backend/knowledge_base/milvus.py", line 465, in async_hybrid_search_with_score
result = future.result()
File "/usr/local/lib/python3.10/site-packages/pymilvus/orm/future.py", line 32, in result
return self.on_response(self._f.result())
File "/usr/local/lib/python3.10/site-packages/pymilvus/client/asynch.py", line 117, in result
self._results = self.on_response(self._response)
File "/usr/local/lib/python3.10/site-packages/pymilvus/client/asynch.py", line 166, in on_response
check_status(response.status)
File "/usr/local/lib/python3.10/site-packages/pymilvus/client/utils.py", line 60, in check_status
raise MilvusException(status.code, status.reason, status.error_code)
pymilvus.exceptions.MilvusException: <MilvusException: (code=65535, message=string field contains invalid UTF-8)>
求教如何解决该问题,感谢!
Beta Was this translation helpful? Give feedback.
All reactions