Embeddings not showing up in Milvus distance searches right after insertion. How do you deal with this? #42847
Replies: 2 comments 3 replies
-
Thanks for raising this — it’s a very relevant topic, and we’ve seen similar needs from other users recently. Could you briefly share more context on your use case and why deduplication at the vector level is essential? We've received several requests around this, but we're still building a complete picture of the underlying motivations. We're actively evaluating ways to support this natively in Milvus. Using a strong consistency setup and performing a search before insert can help reduce duplicates. That said, with ANN-based search, it's important to note that deduplication isn't guaranteed to be perfect due to the approximate nature of the algorithm. You don’t need to flush after each batch — newly inserted data is immediately visible for search. In fact, we have introduced growing indexing, which enables IVF-based indexes to support real-time inserts without waiting for a segment flush. This significantly reduces latency for high-throughput pipelines and improves search consistency. Let us know if you're open to discussing further — we'd love to better understand your requirements and explore how we can help. @czs007 |
Beta Was this translation helpful? Give feedback.
-
After fix primary key dedup, this vector dedup seems to be a very interesting topic to work with. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm running a high-throughput pipeline that is inserting hundreds of embeddings per second into Milvus. I use a "search before insert" strategy to prevent duplicates and close-embeddings, as they are not useful for my use case. However, I’m noticing that many recently inserted embeddings aren’t searchable immediately, which leads to duplicate entries getting in.
I understand Milvus has an eventual consistency model and recently inserted data may not be visible until segments are flushed/sealed, but I was wondering:
Any insight would be appreciated.
Beta Was this translation helpful? Give feedback.
All reactions