Reduce sorting in TopDocs #2646

stuhood · 2025-06-07T22:54:16Z

Reduce sorting in TopDocs by not sorting individual segments -- they will be merged and top-n'd anyway.

full
top_docs_small_shallow    Memory: 97.3 KB (-0.58%)     Avg: 4.5728ms (-2.23%)      Median: 4.5707ms (-2.05%)      [4.5540ms .. 4.6014ms]
top_docs_small_deep       Memory: 6.2 MB (-0.01%)      Avg: 14.2786ms (-33.59%)    Median: 14.2741ms (-33.70%)    [14.1030ms .. 14.5956ms]
top_docs_large_shallow    Memory: 698.7 KB (-0.05%)    Avg: 6.8426ms (-8.48%)      Median: 6.8434ms (-8.26%)      [6.7612ms .. 6.9138ms]
top_docs_large_deep       Memory: 6.8 MB (-0.01%)      Avg: 14.2691ms (-37.26%)    Median: 14.2846ms (-37.28%)    [14.0335ms .. 14.3867ms]
dense
top_docs_small_shallow    Memory: 92.9 KB (-0.35%)    Avg: 4.6792ms (-2.13%)      Median: 4.6785ms (-2.01%)      [4.6559ms .. 4.7226ms]
top_docs_small_deep       Memory: 5.9 MB              Avg: 15.0375ms (-29.50%)    Median: 15.0420ms (-29.46%)    [14.8413ms .. 15.1579ms]
top_docs_large_shallow    Memory: 662.9 KB            Avg: 7.0038ms (-6.82%)      Median: 6.9954ms (-6.79%)      [6.9232ms .. 7.0746ms]
top_docs_large_deep       Memory: 6.4 MB              Avg: 15.2383ms (-32.39%)    Median: 15.2466ms (-32.32%)    [15.1046ms .. 15.3693ms]
sparse
top_docs_small_shallow    Memory: 40.7 KB (+0.59%)    Avg: 17.1796ms (+1.15%)    Median: 17.1670ms (+1.01%)    [17.1109ms .. 17.3223ms]
top_docs_small_deep       Memory: 2.9 MB              Avg: 22.8191ms (-6.98%)    Median: 22.8086ms (-6.93%)    [22.6527ms .. 23.3086ms]
top_docs_large_shallow    Memory: 324.8 KB            Avg: 18.6777ms (-0.57%)    Median: 18.6723ms (-0.64%)    [18.5654ms .. 18.8087ms]
top_docs_large_deep       Memory: 3.2 MB              Avg: 22.7570ms (-7.25%)    Median: 22.6764ms (-7.45%)    [22.5315ms .. 23.3794ms]
multivalue
top_docs_small_shallow    Memory: 93.7 KB (-4.49%)     Avg: 5.0297ms (-1.87%)      Median: 5.0248ms (-1.96%)      [4.9936ms .. 5.1738ms]
top_docs_small_deep       Memory: 5.9 MB (-5.25%)      Avg: 15.2124ms (-30.98%)    Median: 15.2013ms (-31.04%)    [15.0788ms .. 15.3836ms]
top_docs_large_shallow    Memory: 662.9 KB (-5.13%)    Avg: 7.2877ms (-8.40%)      Median: 7.2846ms (-8.60%)      [7.1833ms .. 7.3893ms]
top_docs_large_deep       Memory: 6.4 MB (-5.25%)      Avg: 15.3104ms (-34.34%)    Median: 15.2782ms (-34.49%)    [15.0508ms .. 15.6572ms]

src/collector/top_score_collector.rs

stuhood · 2025-06-15T00:34:35Z

full
top_docs_small_shallow    Memory: 97.3 KB (-0.58%)     Avg: 4.5728ms (-2.23%)      Median: 4.5707ms (-2.05%)      [4.5540ms .. 4.6014ms]
top_docs_small_deep       Memory: 6.2 MB (-0.01%)      Avg: 14.2786ms (-33.59%)    Median: 14.2741ms (-33.70%)    [14.1030ms .. 14.5956ms]
top_docs_large_shallow    Memory: 698.7 KB (-0.05%)    Avg: 6.8426ms (-8.48%)      Median: 6.8434ms (-8.26%)      [6.7612ms .. 6.9138ms]
top_docs_large_deep       Memory: 6.8 MB (-0.01%)      Avg: 14.2691ms (-37.26%)    Median: 14.2846ms (-37.28%)    [14.0335ms .. 14.3867ms]
dense
top_docs_small_shallow    Memory: 92.9 KB (-0.35%)    Avg: 4.6792ms (-2.13%)      Median: 4.6785ms (-2.01%)      [4.6559ms .. 4.7226ms]
top_docs_small_deep       Memory: 5.9 MB              Avg: 15.0375ms (-29.50%)    Median: 15.0420ms (-29.46%)    [14.8413ms .. 15.1579ms]
top_docs_large_shallow    Memory: 662.9 KB            Avg: 7.0038ms (-6.82%)      Median: 6.9954ms (-6.79%)      [6.9232ms .. 7.0746ms]
top_docs_large_deep       Memory: 6.4 MB              Avg: 15.2383ms (-32.39%)    Median: 15.2466ms (-32.32%)    [15.1046ms .. 15.3693ms]
sparse
top_docs_small_shallow    Memory: 40.7 KB (+0.59%)    Avg: 17.1796ms (+1.15%)    Median: 17.1670ms (+1.01%)    [17.1109ms .. 17.3223ms]
top_docs_small_deep       Memory: 2.9 MB              Avg: 22.8191ms (-6.98%)    Median: 22.8086ms (-6.93%)    [22.6527ms .. 23.3086ms]
top_docs_large_shallow    Memory: 324.8 KB            Avg: 18.6777ms (-0.57%)    Median: 18.6723ms (-0.64%)    [18.5654ms .. 18.8087ms]
top_docs_large_deep       Memory: 3.2 MB              Avg: 22.7570ms (-7.25%)    Median: 22.6764ms (-7.45%)    [22.5315ms .. 23.3794ms]
multivalue
top_docs_small_shallow    Memory: 93.7 KB (-4.49%)     Avg: 5.0297ms (-1.87%)      Median: 5.0248ms (-1.96%)      [4.9936ms .. 5.1738ms]
top_docs_small_deep       Memory: 5.9 MB (-5.25%)      Avg: 15.2124ms (-30.98%)    Median: 15.2013ms (-31.04%)    [15.0788ms .. 15.3836ms]
top_docs_large_shallow    Memory: 662.9 KB (-5.13%)    Avg: 7.2877ms (-8.40%)      Median: 7.2846ms (-8.60%)      [7.1833ms .. 7.3893ms]
top_docs_large_deep       Memory: 6.4 MB (-5.25%)      Avg: 15.3104ms (-34.34%)    Median: 15.2782ms (-34.49%)    [15.0508ms .. 15.6572ms]

benches/agg_bench.rs

src/collector/top_collector.rs

ChillFish8 · 2025-07-13T23:45:08Z

benches/collector_bench.rs

+    black_box(searcher.search(&AllQuery, &collector).unwrap());
+}
+fn top_docs_small_deep(index: &Index) {
+    execute_top_docs::<u64>(index, "score", Order::Asc, 10000, 10);


Nit: the benches all test Ascending order only from what I can see, so this function could either be removed or should add some runs in descending order as well.

You mean that you would prefer that the order: Order argument be removed? I'm fine with that, but it can also be useful to expose assumptions like that in the caller.

PSeitz · 2025-07-14T04:38:02Z

benches/agg_bench.rs

@@ -402,7 +402,7 @@ fn get_test_index_bench(cardinality: Cardinality) -> tantivy::Result<Index> {
        .collect::<Vec<_>>();
    {
        let mut rng = StdRng::from_seed([1u8; 32]);
-        let mut index_writer = index.writer_with_num_threads(1, 200_000_000)?;
+        let mut index_writer = index.writer_with_num_threads(8, 200_000_000)?;


This changes the test to have 8 segments instead of 1

Yes, that's intentional. I believe that it should change to something other than 1, as it's definitely not realistic to have 1 segment in production.

It's very likely though that you would want the segments to be produced in some more deterministic order though, so if you'd rather I revert this here, that's totally fine.

PSeitz · 2025-07-14T04:39:50Z

benches/collector_bench.rs

+fn top_docs_small_deep(index: &Index) {
+    execute_top_docs::<u64>(index, "score", Order::Asc, 10000, 10);
+}
+fn top_docs_small_shallow(index: &Index) {


Suggested change

fn top_docs_small_shallow(index: &Index) {

fn top_docs_top_10(index: &Index) {

PSeitz · 2025-07-14T04:41:34Z

benches/collector_bench.rs

+fn top_docs_small_shallow(index: &Index) {
+    execute_top_docs::<u64>(index, "score", Order::Asc, 0, 10);
+}
+fn top_docs_large_deep(index: &Index) {


Suggested change

fn top_docs_large_deep(index: &Index) {

fn top_docs_top_1_000_skip_10_000(index: &Index) {

PSeitz · 2025-07-14T04:43:38Z

benches/collector_bench.rs

+    OptionalSparse = 3,
+}
+
+fn get_test_index_bench(cardinality: Cardinality) -> tantivy::Result<Index> {


most fields are unused

That's a function of having cloned this from agg_bench, which @fulmicoton suggested.

I'll be honest: I think that we would be better off renaming agg_bench to collector_bench, and then putting the TopDocs benchmark functions in there as well. Because aggregations are also "just" collectors, and both sets of benchmarks want to consume similar datasets.

If we want to keep them in two benchmark files, then how would you feel about adding an un-published benchmarks-support crate containing this function and other support code?

PSeitz · 2025-07-14T04:44:57Z

benches/collector_bench.rs

+                text_field_few_terms => "cool",
+                text_field_few_terms => "cool",
+                score_field => 1u64,
+                score_field => 1u64,


The value distribution is quite different for the multivalue case for the score field

See above: this is copypasta from agg_bench.

stuhood · 2025-07-22T20:44:49Z

Ping on this one. It seems like the primary thing to figure out is where the benchmarks should live, and if they should be separated, then whether support code should be broken out into another crate.

@PSeitz, @fulmicoton: What do you prefer?

ChillFish8 reviewed Jun 7, 2025

View reviewed changes

src/collector/top_score_collector.rs Outdated Show resolved Hide resolved

stuhood force-pushed the stuhood.reduce-top-n-sorting branch from 4526fc0 to 267b41b Compare June 15, 2025 00:26

fulmicoton reviewed Jun 20, 2025

View reviewed changes

benches/agg_bench.rs Outdated Show resolved Hide resolved

fulmicoton reviewed Jun 20, 2025

View reviewed changes

src/collector/top_collector.rs Show resolved Hide resolved

fulmicoton reviewed Jun 20, 2025

View reviewed changes

src/collector/top_collector.rs Show resolved Hide resolved

stuhood force-pushed the stuhood.reduce-top-n-sorting branch from 267b41b to f0f92aa Compare June 27, 2025 04:52

stuhood mentioned this pull request Jul 1, 2025

Fixes for TopDocs::order_by_string_fast_field and TopNComputer #2651

Closed

stuhood added 5 commits July 13, 2025 16:20

Add a microbenchmark for TopDocs.

4ba298d

Increase the agg bench segment count.

5aff31d

Reduce sorting in TopDocs by removing per-segment sorting.

55c3b5e

Review feedback.

8d4d5ed

Review feedback.

36e1713

stuhood force-pushed the stuhood.reduce-top-n-sorting branch from f0f92aa to 36e1713 Compare July 13, 2025 23:34

stuhood requested review from fulmicoton, ChillFish8 and PSeitz July 13, 2025 23:34

ChillFish8 reviewed Jul 13, 2025

View reviewed changes

PSeitz reviewed Jul 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Reduce sorting in TopDocs #2646

Reduce sorting in TopDocs #2646

Uh oh!

stuhood commented Jun 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

stuhood commented Jun 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChillFish8 Jul 13, 2025

Uh oh!

stuhood Jul 14, 2025

Uh oh!

PSeitz Jul 14, 2025

Uh oh!

stuhood Jul 14, 2025

Uh oh!

PSeitz Jul 14, 2025

Uh oh!

PSeitz Jul 14, 2025

Uh oh!

PSeitz Jul 14, 2025

Uh oh!

stuhood Jul 14, 2025

Uh oh!

PSeitz Jul 14, 2025

Uh oh!

stuhood Jul 14, 2025

Uh oh!

stuhood commented Jul 22, 2025

Uh oh!

Uh oh!

	fn top_docs_small_shallow(index: &Index) {
	fn top_docs_top_10(index: &Index) {

	fn top_docs_large_deep(index: &Index) {
	fn top_docs_top_1_000_skip_10_000(index: &Index) {

Uh oh!

Reduce sorting in TopDocs #2646

Are you sure you want to change the base?

Reduce sorting in TopDocs #2646

Uh oh!

Conversation

stuhood commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

stuhood commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stuhood commented Jul 22, 2025

Uh oh!

Uh oh!

stuhood commented Jun 7, 2025 •

edited

Loading

stuhood commented Jun 15, 2025 •

edited

Loading