You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've gone through the documentation but still need help understanding the data flow correctly. Below is my process flow - can someone explain how data moves between different components?
My Current Process Flow
Connected to Milvus server (running on VM using Docker Compose)
Created schema
Created collection
Inserted data
Flush
Create index
Load the collection
Query
Release the collection
What I Need to Understand
I need to understand how data flows between components at each step:
Connection: PyMilvus connects to Milvus server using host
Schema Creation: Where is schema information stored?
Collection Creation: Where is this collection data stored? Is it visible?
Data Insertion: Where is this data stored? Which component is responsible?
Flush: I understand it seals the growing segment, but what exactly happens?
Index Creation: Index node is responsible, but where is the index stored?
Load Collection: From where to where does it load?
I need a simple and clear explanation of this flow to better understand how Milvus works internally. Where the data is stored, which components are responsible ( query , index notes )
Collection Creation: Where is this collection data stored? Is it visible?
If you have inserted data into the collection, the data is stored in S3/minio. The bucket name and rootpath is defined in the milvus.yaml:
Data Insertion: Where is this data stored? Which component is responsible?
The inserted data is stored in S3/minio. The bucket name and rootpath is defined in the milvus.yaml.
Insert requests are received by proxy node, and the proxy node passes the data into Pulsar/Kafka, and then data node consumes the data from Pulsar/kafka. Data node accumulates data in buffer(each growing segment has a buffer). Once a buffer size exceeds a threshold, data node persists the buffer into S3/minio as a sealed segment.
Flush: I understand it seals the growing segment, but what exactly happens?
Flush action force to persist all buffers of data node into S3/minio as sealed segments, no matter how large are the buffers. There might be lots of tiny sealed segments generated in S3/minio.
Index Creation: Index node is responsible, but where is the index stored?
Index node builds index for sealed segments. Each sealed segment has an independent index. All index files are stored in S3/minio, under the [bucket_name]/[root_path]/index_files
Load Collection: From where to where does it load?
After you call load_collection(), the query nodes load index data(read from S3/minio) and put the index data into query node's memory.
All the search/query requests are mainly processed by query nodes.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi team
Understanding Milvus Data Flow
I've gone through the documentation but still need help understanding the data flow correctly. Below is my process flow - can someone explain how data moves between different components?
My Current Process Flow
Connected to Milvus server (running on VM using Docker Compose)
Created schema
Created collection
Inserted data
Flush
Create index
Load the collection
Query
Release the collection
What I Need to Understand
I need to understand how data flows between components at each step:
Connection: PyMilvus connects to Milvus server using host
Schema Creation: Where is schema information stored?
Collection Creation: Where is this collection data stored? Is it visible?
Data Insertion: Where is this data stored? Which component is responsible?
Flush: I understand it seals the growing segment, but what exactly happens?
Index Creation: Index node is responsible, but where is the index stored?
Load Collection: From where to where does it load?
I need a simple and clear explanation of this flow to better understand how Milvus works internally. Where the data is stored, which components are responsible ( query , index notes )
Beta Was this translation helpful? Give feedback.
All reactions