Compaction的调度瓶颈（Best Practices to Handle 50k partitions Per Cluster (Compaction/Indexing追不上的问题）：续） #38997

xiaobingxia-at · 2025-01-04T00:04:23Z

xiaobingxia-at
Jan 4, 2025

假如我每小时要insert数据到2500个partition当中，由于各种compaction设定，我需要做10000个compaction task。

我有100个collection（or channel)，我的ingestion script每小时会导入数据到2500个partition，分布在8个collection。

根据milvus compaction的逻辑：

每隔60秒（mix.triggerInterval), coordinator会挑segment candidate，形成compaction tasks，进入到handler的queue中，state为pipelining。
每隔3秒，handler会进行状态机查询并升级。状态会从pipelining -> executing -> meta saved -> completed or failed，所以完整的一个compaction job完成需要9秒。
从pipelining到executing的过程中，会有以下限制：1. 不能超过最高max parallel task num 2. data node有足够的slot 3. 每个channel只能有一个compaction task在进行，我这里理解的channel是指每个collection。

根据以上分析做一个简单计算：
这个cluster每小时能完成的compaction数目是：
3600秒 / 9秒 = 400个周期
400个周期 * 8 compaction tasks （因为我导入数据到了8个collection中） = 3200 compaction tasks

3200 tasks这就是cluster每小时能完成的compaction task数目。但是一个小时内，会有10000个compaction task送到queue中，就会导致compaction task越积越多永远无法完成的情况，最后task queue会爆掉。

这里面有三个瓶颈：

每个channel（collection）仅允许有一个task在进行。 https://github.com/milvus-io/milvus/blob/master/internal/datacoord/compaction.go#L215-L219
loopSchedule和loopCheck的轮询周期是3秒。尽管可以调整，但是单位是秒，即使调成一秒，也会完不成任务。 https://github.com/milvus-io/milvus/blob/master/internal/datacoord/compaction.go#L406
compaction task的并行度和collection数目正相关，因为每个collection同一时间只允许有一个compaction task进行。所以，如果导入数据集中在有限的几个collection过程中，compaction task就会堆积。

问题和feature request：

每个channel（collection）仅允许有一个task在进行，这有什么原因吗？
loopSchedule和loopCheck的轮询周期我希望调整成250ms，而不是3秒。能否把config的单位由秒改成毫秒。
目前是100个collection，每个collection 500 partition。由于collection数目会影响并行度，请问是否建议改成500 collection，100 partition每个collection，请问有什么其他问题吗？

谢谢。

xiaocai2333 · 2025-01-04T07:06:02Z

xiaocai2333
Jan 4, 2025
Collaborator

非 l0 compaction 发生的单位是 vchannel-partition，也就是只有相同 vchannel，并且相同 partition 的数据才会被 compaction。
这里的 vchannel 等价于创建表时的 shard。默认值是1，如果创建表时没有指定 shard 数目。后续的 compaction 只需要关注 partition。
l0 compaction 是全表范围的，并且与其他的普通 compaction 互斥。如果您的操作是 upsert，这会导致生成 l0 segment，从而生成 l0 compaction。
一般情况下，一个 compactionTask 不会在秒内完成，如果您的每个 segment 的数据量都非常小的话，请先检查下您的操作中，是否有频繁 flush。另外确认下 compaction 是不是真的在秒内完成的。如果不是，调整轮询周期可能收益不大。
100 个 collection，每个 collection 500 partition。和 500 collection，每个 collection 100 partition。理论上区别不大的。因为就是第一点提到的，compaction task 的范围是 partition 级别。
如果说您看到您的 datanode 和 indexndoe 的资源使用占比都不高的话，可以相应地提高compaction task 和 index task 的并发度。

2 replies

xiaobingxia-at Jan 4, 2025
Author

我们确实有大量partition，每个partition只有几百行。所以compaction确实在秒内完成，所以调整轮询周期对我们收益很大。麻烦把dataCoord.compaction.check.interval单位换成毫秒，调整成250ms对我们来说能支撑大量partition比较重要。

xiaobingxia-at Jan 4, 2025
Author

https://github.com/milvus-io/milvus/blob/master/internal/datacoord/compaction.go#L215-L237
我理解compaction task的范围在partition。但是根据这里的逻辑，是同一时间每个channel只能允许一个compaction运行？这样collection的数目就会大大影响compaction的并发度。举例，我只有一个collection，这个collection有1000个partition，每个partition有一个compaction task。我安排了大量data node，希望这1000个task并行执行。但实际上就是一个一个执行的。

xiaofan-luan · 2025-01-04T17:05:44Z

xiaofan-luan
Jan 4, 2025
Maintainer

CompactionTaskQueueCapacity 太大

loopSchedule -> 3s时间太长，一旦exection有做完的就可以继续调度，调整调度时间不解决问题，会导致cpu过高。

CompactionMaxParallelTasks -> 没有按照datanode的数目，应该根据datanode的slot来定义

Prioritize策略过于简单了，可以优先做比较大的任务。

每个compaction只能有一个task执行，应该没有这个规则，只有L0/Clustering跟Mix 互斥。Mix和Mix不互斥。可以考虑引入partiton级别的L0，这样锁互斥的力度会比较小

@XuanYang-cn @tedxu @xiaocai2333

1 reply

xiaobingxia-at Jan 4, 2025
Author

又重读了一遍 https://github.com/milvus-io/milvus/blob/master/internal/datacoord/compaction.go#L215-L237
确实是在每个collection的级别，不同种类的compaction互斥，但是同类的compaction可以并行进行。👍🏻
CompactionTaskQueueCapacity我把他从100k调到了500k，由于担心到了100k个partition后，compaction task产生的太多。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compaction的调度瓶颈（Best Practices to Handle 50k partitions Per Cluster (Compaction/Indexing追不上的问题）：续） #38997

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Compaction的调度瓶颈 （Best Practices to Handle 50k partitions Per Cluster (Compaction/Indexing追不上的问题）： 续） #38997

Uh oh!

Uh oh!

xiaobingxia-at Jan 4, 2025

Replies: 2 comments · 3 replies

Uh oh!

xiaocai2333 Jan 4, 2025 Collaborator

Uh oh!

xiaobingxia-at Jan 4, 2025 Author

Uh oh!

xiaobingxia-at Jan 4, 2025 Author

Uh oh!

Uh oh!

xiaofan-luan Jan 4, 2025 Maintainer

Uh oh!

Uh oh!

xiaobingxia-at Jan 4, 2025 Author

Compaction的调度瓶颈（Best Practices to Handle 50k partitions Per Cluster (Compaction/Indexing追不上的问题）：续） #38997

xiaobingxia-at
Jan 4, 2025

Replies: 2 comments 3 replies

xiaocai2333
Jan 4, 2025
Collaborator

xiaobingxia-at Jan 4, 2025
Author

xiaobingxia-at Jan 4, 2025
Author

xiaofan-luan
Jan 4, 2025
Maintainer

xiaobingxia-at Jan 4, 2025
Author