使用自定义schema时在create_collection中创建动态字段失效 #41316
Unanswered
IeohMingChan
asked this question in
Q&A and General discussion
Replies: 1 comment 1 reply
-
在2.4的文档里,建表方式有两种:
你用的Customized setup,所以create_collection()的enable_dynamic_field参数不生效。我觉得这里,当create_collection()的enable_dynamic_field和CollectionSchema的enable_dynamic_field不一致的时候,pymilvus应该给一个报错,否则确实很容易误导。 没有开启enable_dynamic_field的表,没法修改。所以只能再建一个开启了enable_dynamic_field新表来导入旧表的数据。 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
背景:
我已通过自定义schema创建一个集合,在创建集合时设置动态字段:client.create_collection(new_collection_name, schema=schema, enable_dynamic_field=True)并写入若干数据。随后,我尝试在该集合中插入包含未定义字段的值时发生报错:Error fetching documents: <DataNotMatchException: (code=1, message=Attempt to insert an unexpected field
dynamic_fields_1
to collection without enabling dynamic field)>经过测试发现,使用自定义schema时必须在client.create_schema方法中设置动态字段client.create_schema(enable_dynamic_field=True),而create_collection中所设动态字段的值失效,即只要不在create_schema中设置动态字段,无论在create_collection设置何值,动态字段均不启用。
问题:我目前已创建集合并写入海量数据,当时是根据官方文档通过create_collection设置动态字段,而不是create_schema。现在我希望往该集合中写入未定义的字段,请问可以如何设置使该集合启用动态字段以避免重写数据,还是必须重建集合并通过create_schema启用动态字段?
所用代码如下,其中create_schema及create_collection中动态字段值已尝试过各种值以印证上述结论。感谢
相关代码
import concurrent.futures
import re
import time
from loguru import logger
from pymilvus import connections, Collection, utility, DataType, MilvusClient, FunctionType, Function
num_entities = []
insert_counts = []
kb_ids = []
host = "xxx"
port = xxx
uri = f"http://{host}:{port}"
old_db = "knowledge"
new_db = "knowledge_base"
new_collection_name = "knowledge_xiaobu_3"
is_create_collection = False
connections.connect(alias='source', host=host, port=port, db_name=old_db, user="root", password="Milvus")
def generate_binary_string(input_list):
# 初始化一个长度为1024的字符串,全部填充为'0'
result = ['0'] * 1024
def get_kb_id(source):
def fetch_documents_from_collection(collection: Collection, new_collection: str, client: MilvusClient, partition_name: str, doc_type: str):
try:
def create_schema(client):
schema = client.create_schema(enable_dynamic_field=True) # 在创建集合时未设置该值,追加插入数据时报错。
tokenizer_params = {
"tokenizer": "jieba"
}
# 字段定义
schema.add_field("pk", DataType.INT64, is_primary=True, auto_id=True)
schema.add_field("text", DataType.VARCHAR, max_length=65535, analyzer_params=tokenizer_params,
enable_analyzer=True)
schema.add_field("vector", DataType.FLOAT_VECTOR, dim=1792, mmap_enable=True)
schema.add_field("vector_bm25", datatype=DataType.SPARSE_FLOAT_VECTOR)
bm25_function = Function(
name="text_bm25_emb", # Function name
input_field_names=["text"], # Name of the VARCHAR field containing raw text data
output_field_names=["vector_bm25"],
# Name of the SPARSE_FLOAT_VECTOR field reserved to store generated embeddings
function_type=FunctionType.BM25,
)
def process_collection(old_collection_name, client):
# 加载原有集合
old_collection = Collection(name=old_collection_name, using="source")
def main():
if name == "main":
# from pymilvus import db
#
# connections.connect(host=host, port=port, user=user, password=password)
# db.create_database(target_db)
main()
Beta Was this translation helpful? Give feedback.
All reactions