Skip to content

Support query filter on all benchmarks #16477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 24, 2025
Merged

Conversation

pepijnve
Copy link
Contributor

Which issue does this PR close?

None

Rationale for this change

bench.sh currently supports query filtering for tpch specifically. Most of the other multi query benchmarks support this too, but not via bench.sh. It would make sense to have this capability for all benchmarks that support it.

What changes are included in this PR?

Promote the tpch specific logic to the top level

Are these changes tested?

Manually tested

Are there any user-facing changes?

No

alamb
alamb previously approved these changes Jun 21, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the following command:

(venv) andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ ./benchmarks/bench.sh run clickbench_partitioned

And it passes n main but fails on this branch:

     Running `/Users/andrewlamb/Software/datafusion/target/release/dfbench clickbench --iterations 5 --path /Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned --queries-path /Users/andrewlamb/Software/datafusion/benchmarks/queries/clickbench/queries -o /Users/andrewlamb/Software/datafusion/benchmarks/results/bench_query_filter/clickbench_partitioned.json`
Running benchmarks with the following options: RunOpt { query: None, common: CommonOpt { iterations: 5, partitions: None, batch_size: None, mem_pool_type: "fair", memory_limit: None, sort_spill_reservation_bytes: None, debug: false }, path: "/Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned", queries_path: "/Users/andrewlamb/Software/datafusion/benchmarks/queries/clickbench/queries", output_path: Some("/Users/andrewlamb/Software/datafusion/benchmarks/results/bench_query_filter/clickbench_partitioned.json") }
Error: Execution("Could not open \"/Users/andrewlamb/Software/datafusion/benchmarks/queries/clickbench/queries\": No such file or directory (os error 2)")

🤔

@alamb alamb dismissed their stale review June 21, 2025 11:29

Clicked wrong button

@pepijnve
Copy link
Contributor Author

Oh now I see what happened. I got some bench.sh changes intended for PR #16476 mixed up with this branch 🤦‍♂️.

@pepijnve pepijnve force-pushed the bench_query_filter branch from 8b9f844 to a5854fa Compare June 21, 2025 13:38
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @pepijnve !

I tried

./benchmarks/bench.sh run clickbench_partitioned

and also

./benchmarks/bench.sh run clickbench_partitioned 1

And they both worked great!

@pepijnve
Copy link
Contributor Author

pepijnve commented Jun 24, 2025

I can't seem to make sense of the doc error. Doesn't seem to be related to the changes in this PR.

Just saw this was fixed on main.

@pepijnve pepijnve force-pushed the bench_query_filter branch from 94685af to c8a9614 Compare June 24, 2025 06:32
@alamb
Copy link
Contributor

alamb commented Jun 24, 2025

I also double checked that the commands from #16477 (review) still work.

🚀

@alamb alamb merged commit ec92ed3 into apache:main Jun 24, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented Jun 24, 2025

Thanks again @pepijnve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants