Skip to content

[Bug] 关于使用embedding模型batch_size超过上限的建议 #669

@zmalqp189

Description

@zmalqp189

Search before asking

  • I had searched in the issues and found no similar issues.

Operating system information

Linux

What happened

我使用的是阿里的text-embedding-v3嵌入模型,api限制了一次性输入的batch_size大小不超过10。因此当知识库检索出过多知识时,会出现如下报错:
2025-08-01 16:15:02 - ERROR - __main__ - Error: Error code: 400 - {'error': {'code': 'InvalidParameter', 'param': None, 'message': '<400> InternalError.Algo.InvalidParameter: Value error, batch size is invalid, it should not be larger than 10.: input.contents', 'type': 'InvalidParameter'}, 'id': '31e5f835-8113-9318-ac76-d049c267', 'request_id': '31e5f5-8113-9318-ac76-d0b3d947'}

How to reproduce

在这里我给出我的解决方案,
修改./KAG-master/kag/common/text_sim_by_vector.py这个文件的63行,添加如下代码:

                if len(need_call_emb_text) > 0:
                    if len(need_call_emb_text) <= 10:
                        emb_res = self.vectorize_model.vectorize(need_call_emb_text)
                    else:
                        emb_res = []
                        for i in range((len(need_call_emb_text) + 9 )// 10):
                            emb_res_part = self.vectorize_model.vectorize(need_call_emb_text[i*10:(i+1)*10])
                            emb_res += emb_res_part

以上只是我的个人不成熟的见解。

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions