Some questions about RemoteBulkInsert #41263
Replies: 1 comment 1 reply
-
RemoteBulkWriter is a smart tool of pymilvus, it helps people to generate data files that can be imported into milvus by bulk_import() interface.
bulk_import() interface is a wrapper of this RESTful API: https://milvus.io/api-reference/restful/v2.5.x/v2/Import%20(v2)/Create.md Note: after the entire import process is done, the data files generated by RemoteBulkWriter will not be deleted, you can manually delete them by yourself. After bulk_import() is called, each data file is an import task, milvus automatically assigns a new import task to an idle data node. More data nodes can process more import tasks parallelly. If you have only one data file to import, one data node is ok. If you have 1000 data files to import, should we create 1000 data nodes? I don't think so, it is just a trade-off. Once all data files are imported, the data nodes will be idle and waste your resources. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Team! I've deployed a 2.5.6 4-node cluster. I want to insert a 200G dataset and I have some questions about remote_bulk_import.
How will the data be distributed among 4 MinIO nodes during bulkwriter appending rows and committing (use the MinIO service started with the Milvus)?
What's the recommended bulkwriter segment_size for inserting 200G data? Does this matter?
What does bulk_import do? Will it create data replica of the stuff in the MinIO bucket?
Should I adjust the sizing-tool config which includes only one datanode(8CPU + 32G)?
Any help or explanation will be very appreciated, THANKS! : )
Beta Was this translation helpful? Give feedback.
All reactions