bm25 recall case #41014
-
according Full Text Search (BM25) https://milvus.io/docs/full-text-search.md#Full-Text-Search-BM25, i create a collection of a book list, book name list as blew:
use standard analyzer to create sparse_float_vector field for book name as "book_name_bm25", schema is like: and search code is : search_params = { the result list is empty, but every book name contains the term "穿越", is this normal? any solution to solve this problem? change analyzers? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Since the text content is Chinese, you need to set the tokenizer to be "Chinese". https://milvus.io/docs/analyzer-overview.md
|
Beta Was this translation helpful? Give feedback.
-
@yhmo chinese type works, then i write a script to test bm25 time cost, code like search_params = { find that the first search time cost is almost 300ms, but second search cost is less than 3ms, what makes this difference? |
Beta Was this translation helpful? Give feedback.
Since the text content is Chinese, you need to set the tokenizer to be "Chinese". https://milvus.io/docs/analyzer-overview.md
Try this script: