Skip to content

Continuously reporting format errors when calling Tencent Cloud's vector database to add filtering conditions #876

@orcharddd2024

Description

@orcharddd2024

in .venv/Lib/site-packages/langchain_community/vectorstores/tencentvectordb.py
there is a function called "similarity_search_by_vector". When called externally, no matter what format of filter is passed in, it will report an error;

` def search(self, query: str, vectors: List[List[float]], limit: int = 5, filters: Optional[Dict] = None):
"""
Search for similar vectors in LangChain.
"""
# For each vector, perform a similarity search
if filters:
results = self.client.similarity_search_by_vector(embedding=vectors, k=limit, filter=filters)
else:
results = self.client.similarity_search_by_vector(embedding=vectors, k=limit)

    final_results = self._parse_output(results)
    return final_results

`
The source code in the open-source framework mem0 is as shown above, but when called, the following error will be reported

Image

Change the source code of mem0 to the following

` def search(self, query: str, vectors: List[List[float]], limit: int = 5, filters: Optional[Dict] = None):
"""
Search for similar vectors in LangChain / TencentVectorDB.
Compatible with TencentVectorDB filter grammar.
"""
filter_expr = None
if filters:
if isinstance(filters, dict):
# 转换为 LangChain/TencentVectorDB DSL 可解析格式
filter_parts = []
for k, v in filters.items():
if v is None:
continue
# 自动判断类型,加引号
if isinstance(v, str):
v = v.replace('"', '\"') # 转义双引号
filter_parts.append(f'{k} == "{v}"')
else:
filter_parts.append(f'{k} == {v}')
filter_expr = " and ".join(filter_parts)

        elif isinstance(filters, str):
            # 容错转换: 单等号改双等号, 单引号改双引号
            filter_expr = filters.replace(" = ", " == ").replace("'", '"')

    # (可选)日志调试
    # print(f"[VectorSearch] filter_expr={filter_expr}")

    if filter_expr:
        results = self.client.similarity_search_by_vector(
            embedding=vectors, k=limit, filter=filter_expr
        )
    else:
        results = self.client.similarity_search_by_vector(
            embedding=vectors, k=limit
        )

    final_results = self._parse_output(results)
    return final_results

`

Report the following error,The concatenated string is also incorrect;

Image Image
  1. The first mistake is
    In the similarity search vector call of TencentVectorDB, the passed filter parameter is not a string or None, but a dictionary or other type, causing the Lark parser in the underlying translate_filter() function to report an error: TypeError: text must be str or bytes

  2. The second mistake is
    The Lark syntax parser used internally by TencentVectorDB does not accept traditional SQL style expressions (user_i='zz ').
    In the source code of langchain_comunity. vectorstores. tencentvectordb (you can open it to see the translate_filter definition),
    The expected filter expression syntax of Tencent Vector Database's LangChain wrapper is actually JSON style or Python logical expression, rather than SQL format.

but Spelling the expression as user_id=="zz" and... still rejected by Lark......

So no matter how you try to fix it, it always reports an error. Is there a bug in this area? Or was there something I didn't notice? How should I modify it?

.venv/Lib/site-packages/langchain_community/vectorstores/tencentvectordb.py,The relevant source code is as follows

Image Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions