Restore topk filtering tests #16501

alamb · 2025-06-22T13:06:38Z

Which issue does this PR close?

Closes SortQueryFuzzer found a failing case on main #16452

Rationale for this change

@AdamGS removed some of the code here Temporarily fix bug in dynamic top-k optimization #16465
However, the test kept failing
@adriangb fixed what we think is the real issue here: re-enable sort_query_fuzzer_runner #16491

What changes are included in this PR?

Let's restore the removed code in Temporarily fix bug in dynamic top-k optimization #16465

Are these changes tested?

Are there any user-facing changes?

…)" This reverts commit 5ca4ff0.

adriangb · 2025-06-22T13:42:01Z

We have the test in https://github.com/apache/datafusion/pull/16465/files#diff-f38cac7a9ac55c93d71632c96d6d2afa219cfb07351125a349099c86df859446 which seems to be passing. I'm running a local 1200 run iteration to confirm.

adriangb · 2025-06-22T13:51:12Z

Sadly the 1200 run still reports failures 😭

I feel like @AdamGS 's original intuition that it's something about sort stabilit with nulls is correct. I'll see if I can find a fix...

alamb · 2025-06-22T14:03:15Z

Thanks @adriangb !

adriangb · 2025-06-22T17:17:12Z

So from my investigation what I think is happening is that #15770 fundamentally converted the TopK operation from being isolated per partition to having shared state via the dynamic filter. This causes some non-determinism with test runs since partitions can interact. I think this doesn't cause actual issues with queries, but the tests are picking it up. But I'm not 100% sure about that. @Dandandan and I were already talking about having a shared TopK heap between partitions, I think that would resolve the issue. But otherwise more investigation is needed.

FWIW the TopK dynamic filters still work without this code - it's just using the filter to filter rows in the TopK operator itself that doesn't work.

This is all I had time for today. I think more work is needed before we can merge this PR in the current state.

alamb · 2025-06-23T15:00:31Z

r. This causes some non-determinism with test runs since partitions can interact. I think this doesn't cause actual issues with queries, but the tests are picking it up.

This sounds like we need to update the tests to be deterministic then somehow (or ignore results that are not deterministic

AdamGS · 2025-06-23T15:24:42Z

Would love to give a hand with that, I have some thoughts I can try and put into a preliminary PR.
It also seems like Datafusion is going to have more of this shared state that's sensetive to how event interleave, and it might be worth it to make a larger effort to enable (more) deterministing simulation.

adriangb · 2025-06-23T15:31:30Z

Thank you @AdamGS! It would be super helpful if we could first determine if the test is being overly sensitive to non-determinism (query results are exactly the same across runs and correct but test still fails) or if the issue is actually reflecting incorrect query results or non-deterministic query results (e.g. the query is correct according to the sort order but the actual order of rows is different across runs).

alamb added 2 commits June 22, 2025 09:03

Revert "Temporarily fix bug in dynamic top-k optimization (apache#16465…

9e28c17

…)" This reverts commit 5ca4ff0.

restore

02b6fad

github-actions bot added the physical-plan Changes to the physical-plan crate label Jun 22, 2025

alamb changed the title ~~Alamb/revert fix~~ Restore topk filtering tests Jun 22, 2025

This was referenced Jun 22, 2025

Temporarily fix bug in dynamic top-k optimization #16465

Merged

SortQueryFuzzer found a failing case on main #16452

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Restore topk filtering tests #16501

Restore topk filtering tests #16501

alamb commented Jun 22, 2025 •

edited

Loading

Uh oh!

adriangb commented Jun 22, 2025

Uh oh!

adriangb commented Jun 22, 2025

Uh oh!

alamb commented Jun 22, 2025

Uh oh!

adriangb commented Jun 22, 2025

Uh oh!

alamb commented Jun 23, 2025

Uh oh!

AdamGS commented Jun 23, 2025

Uh oh!

adriangb commented Jun 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Restore topk filtering tests #16501

Are you sure you want to change the base?

Restore topk filtering tests #16501

Conversation

alamb commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

adriangb commented Jun 22, 2025

Uh oh!

adriangb commented Jun 22, 2025

Uh oh!

alamb commented Jun 22, 2025

Uh oh!

adriangb commented Jun 22, 2025

Uh oh!

alamb commented Jun 23, 2025

Uh oh!

AdamGS commented Jun 23, 2025

Uh oh!

adriangb commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

alamb commented Jun 22, 2025 •

edited

Loading

adriangb commented Jun 23, 2025 •

edited

Loading