Compaction的调度瓶颈 (Best Practices to Handle 50k partitions Per Cluster (Compaction/Indexing追不上的问题): 续) #38997
xiaobingxia-at
started this conversation in
Ideas & Feature requests
Replies: 2 comments 3 replies
-
|
Beta Was this translation helpful? Give feedback.
2 replies
-
CompactionTaskQueueCapacity 太大 loopSchedule -> 3s时间太长,一旦exection有做完的就可以继续调度,调整调度时间不解决问题,会导致cpu过高。 CompactionMaxParallelTasks -> 没有按照datanode的数目,应该根据datanode的slot来定义 Prioritize策略过于简单了,可以优先做比较大的任务。 每个compaction只能有一个task执行,应该没有这个规则,只有L0/Clustering跟Mix 互斥。Mix和Mix不互斥。可以考虑引入partiton级别的L0,这样锁互斥的力度会比较小 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
#38996
假如我每小时要insert数据到2500个partition当中,由于各种compaction设定,我需要做10000个compaction task。
我有100个collection(or channel),我的ingestion script每小时会导入数据到2500个partition,分布在8个collection。
根据milvus compaction的逻辑:
根据以上分析做一个简单计算:
这个cluster每小时能完成的compaction数目是:
3600秒 / 9秒 = 400个周期
400个周期 * 8 compaction tasks (因为我导入数据到了8个collection中) = 3200 compaction tasks
3200 tasks这就是cluster每小时能完成的compaction task数目。但是一个小时内,会有10000个compaction task送到queue中,就会导致compaction task越积越多永远无法完成的情况,最后task queue会爆掉。
这里面有三个瓶颈:
问题和feature request:
谢谢。
Beta Was this translation helpful? Give feedback.
All reactions