-
Notifications
You must be signed in to change notification settings - Fork 3.3k
channel not available: can't search after docker down and up #41651
Replies: 2 comments · 10 replies
-
Please offer the full logs for investigation. You can release and load the collection again. |
Beta Was this translation helpful? Give feedback.
All reactions
-
I have a single collection. I'm not able to release and load because I get another error related with OOM. I'm now trying to create the index in-disk but is not working either. The indexes creation is using a lot of memory, and I suspect is because of a sparse vector index that is using almost all my ram memory. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Your milvus is a standalone, all components(queryNode, dataNode, indexNode) are running in the same process. All the tasks(compaction tasks, build index tasks, loading tasks) also use the same RAM. The memory usage peak could be very high and easily lead to OOM if the physical RAM is insufficient. Disk-index is only for dense vectors. It sounds like your collection has multi-vector fields: one dense vector field and one sparse vector field. How many rows are in the collection? There is a command-line tool named "birdwatcher" that can force release a collection. The steps to force release the collection:
|
Beta Was this translation helpful? Give feedback.
All reactions
-
@yhmo thanks. I started everything from scratch, this time trying to index dense only. I'm limiting the RAM to 24GB, using float16 and diskann, and even so I'm still getting OOM when I'm reaching about 6M vectors (1024 dim each). |
Beta Was this translation helpful? Give feedback.
All reactions
-
I think 6M 1024dim can be fit into 24G machines. can you share your logs so we can investigate on it? |
Beta Was this translation helpful? Give feedback.
All reactions
-
Different Client-side operations might have different memory usage. The memory usage peak of (1) is much higher than (2), because index tasks, compaction tasks, and loading tasks are executed at the same time in one process. An example of (1)
An example of (2)
I compare the two cases on my machine, with 600K vectors. The memory usage peak of (1) is 8.5 GB, (2) is 4.5 GB. |
Beta Was this translation helpful? Give feedback.
All reactions
-
also did you tune any of the parameters like segment size? |
Beta Was this translation helpful? Give feedback.
All reactions
-
@xiaofan-luan is there a method to get the logs or are you referring to the docker logs (they are big...)? I haven't tune. Used the default values. @yhmo those values are quite larger than expected, no? For example HNSW memory estimation is (d * 4 + M * 2 * 4), which gives about 2.6GB, for 600k of 1024 and M=30 (I'm assuming is the default value). And this is for memory index. I was expecting that DiskANN would use much less memory. I just did a test to index 10M of random vectors (fp16) and still having the same OOM. You can replicate with the following:
The docker compose:
and the milvus.yaml |
Beta Was this translation helpful? Give feedback.
All reactions
-
@miguelwon I tested your script on my machine(64GB RAM). ######################################################################## First of all, if you called create_collection() with "index_params", the create_collection() will called load_collection() atuomatically, the source code is here.
With continual insertion, data is accumulated in a cache of datanode, Once the size of the cache exceeds 100+ MB, datanode will flush it to the storage to be a sealed segment. As we know, in milvus, each segment size is expected to be 1GB. So, at the same time, datanode triggers compaction tasks to merge small segments into large segments. With continual insertion, each compaction task might merge 2 ~ 3 small segments(100+ MB) to be a 300+ MB segment. And with more and more 300+ MB segments generated, compaction tasks will merge 2 ~ 3 segments(300+ MB) to be a 1000+ MB segment. Since standalone only has one datanode, there might be hundreds of compaction tasks executed one by one over a very long time(maybe several hours). At the same time, once a sealed segment is generated, indexnode will trigger a task to build index for this segment. So, there might be hundreds of tasks executed one by one over a very long time(maybe several hours). For the segments that have no index, since the collection is loaded, querynode will load its raw data(original vector data) into memory. You will see memory usage continually increase. The index tasks are much slower than insertion and compaction. Most of the segments are loaded with original vector data instead of index data. So, the memory usage is far more than index size. Each compaction task will occupy memory, maybe 1GB. The size of 5000 * 1000 float16 vectors is 1024 * 2 bytes * 5000 * 1000 = 10GB. To avoid it, you can call create_collection() without "index_params", and create_index() after all data is inserted.
With this script, the memory usage is kept at a low level(< 1GB) during the entire insertion. |
Beta Was this translation helpful? Give feedback.
All reactions
-
can you give us some details number of how much memory consumes with HNSW and disk ANN on 10M fp16 data? here might be some reason:
We are expecting to see HNSW takes 30G memory for load only and DiskANN take 12G with querynode. But also indexnode and datanode will cause extra memory and especially under a standalone node, index build might not be able to catch up if you write too fast so it's highly likely data is still in memory and served as in memory. Suggestions: I would suggest to try at least 48G memory and see if it works. If you are using diskann(make sure you are using nvme SSD), it should work. Maybe reduce the ingestion speed and see if indexnode can catch up index build speed if memory is critical and performance is not important, you can try HNSW + mmap to reduce the memory cost. If accuracy is not critical, try HNSW-SQ8 or IVF sq8 instead. If you have scalar datas, don't forget to mmap it as well. |
Beta Was this translation helpful? Give feedback.
All reactions
-
@yhmo and @xiaofan-luan , thanks for the support. It's working now with the @yhmo proposal of first inserting and only then creating the index. It uses about 11GB of RAM. |
Beta Was this translation helpful? Give feedback.
All reactions
-
DISKANN is slower than in-memory index(such as HNSW). Disk performance also affects DISKANN performance. So, NVMe disk is recommended. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm working in a single machine with docker compose. I set it up, created the database, schema, collection, indexes and populated. Everything was working fine. I then took the Docker containers down (with docker-compose down) and brought them back up. The service runs, and I can retrieve the collection list. However, when performing a simple search, I get the following error:
This is the
docker-compose.yml
I used:Beta Was this translation helpful? Give feedback.
All reactions