Understand milvus end to end process #40528

ranjith502 · 2025-03-10T18:24:24Z

ranjith502
Mar 10, 2025

Hi team
Understanding Milvus Data Flow

I've gone through the documentation but still need help understanding the data flow correctly. Below is my process flow - can someone explain how data moves between different components?

My Current Process Flow
Connected to Milvus server (running on VM using Docker Compose)
Created schema
Created collection
Inserted data
Flush
Create index
Load the collection
Query
Release the collection

What I Need to Understand
I need to understand how data flows between components at each step:

Connection: PyMilvus connects to Milvus server using host
Schema Creation: Where is schema information stored?
Collection Creation: Where is this collection data stored? Is it visible?
Data Insertion: Where is this data stored? Which component is responsible?
Flush: I understand it seals the growing segment, but what exactly happens?
Index Creation: Index node is responsible, but where is the index stored?
Load Collection: From where to where does it load?
I need a simple and clear explanation of this flow to better understand how Milvus works internally. Where the data is stored, which components are responsible ( query , index notes )

yhmo · 2025-03-11T03:56:03Z

yhmo
Mar 11, 2025
Collaborator

Connection: PyMilvus connects to Milvus server using host
No data flow, just a grpc connection between the client and the proxy node is created.
Schema Creation: Where is schema information stored?
Collection schema is stored in the etcd server. The root prefix is defined in the milvus.yaml:

milvus/configs/milvus.yaml

Line 28 in d9fe8f0

rootPath: by-dev
Collection Creation: Where is this collection data stored? Is it visible?
If you have inserted data into the collection, the data is stored in S3/minio. The bucket name and rootpath is defined in the milvus.yaml:

milvus/configs/milvus.yaml

Line 130 in d9fe8f0

rootPath: files
Data Insertion: Where is this data stored? Which component is responsible?
The inserted data is stored in S3/minio. The bucket name and rootpath is defined in the milvus.yaml.
Insert requests are received by proxy node, and the proxy node passes the data into Pulsar/Kafka, and then data node consumes the data from Pulsar/kafka. Data node accumulates data in buffer(each growing segment has a buffer). Once a buffer size exceeds a threshold, data node persists the buffer into S3/minio as a sealed segment.
Flush: I understand it seals the growing segment, but what exactly happens?
Flush action force to persist all buffers of data node into S3/minio as sealed segments, no matter how large are the buffers. There might be lots of tiny sealed segments generated in S3/minio.
Index Creation: Index node is responsible, but where is the index stored?
Index node builds index for sealed segments. Each sealed segment has an independent index. All index files are stored in S3/minio, under the [bucket_name]/[root_path]/index_files
Load Collection: From where to where does it load?
After you call load_collection(), the query nodes load index data(read from S3/minio) and put the index data into query node's memory.
All the search/query requests are mainly processed by query nodes.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Understand milvus end to end process #40528

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Understand milvus end to end process #40528

Uh oh!

ranjith502 Mar 10, 2025

Replies: 1 comment

Uh oh!

yhmo Mar 11, 2025 Collaborator

ranjith502
Mar 10, 2025

yhmo
Mar 11, 2025
Collaborator