-
Notifications
You must be signed in to change notification settings - Fork 1.6k
DRAFT: Update arrow/parquet to 56.0.0 #16690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
6698376
to
1747605
Compare
1747605
to
fc5bd79
Compare
I see some failures in the row_group_pruning tests
Here is an example failure
|
1c82d05
to
76bc1b2
Compare
ok, I now have a clean run! |
Thank you @alamb , I am curious about the benchmark result comparing the main branch, because we will include the apache/arrow-rs#7850 for this PR. And some improvement part of the improvement we have ported to datafusion, but we will also benefit from the dependency changes from the arrow side, such as the sort phase(the merge compare we have ported)/compare, etc. Could we trigger the benchmark for this PR, thanks! |
🤖 |
Done! |
🤖: Benchmark completed Details
|
Thank you @alamb, it seems we have some improvement for clickbench. Not too much because we gain for sort string view mostly which is not in clickbench but in sort_tpch. |
I will start those as well |
🤖 |
🤖: Benchmark completed Details
|
Thank you @alamb , it seems no improvement for sort dependencies(So it means the merge will occupied most time), so we gain most for the ported PRs to merge phase from: |
🤖 |
🤖: Benchmark completed Details
|
Thank you @alamb @Dandandan , we may also try sort_tpch10 benchmark , but it may also not too much improvement, the ported PR already has 1.4x faster for sort_tpch Q11(inlined string view sort). |
I wonder if we should focus on more parallelisation in I think there might be two areas:
|
Thank you @Dandandan , interesting idea, i agree if we can improve this point, it will help not only stringview both also other type, so it will benefit more cases, i will investigate! |
I submit a PR based this PR, try to see if fast_gc can also help for sort_tpch or sort_tpch10 benchmark: May be we can run sort_tpch for above PR to see the result, thanks! |
Here is a related idea;
This is an interesting idea and I think it may make a significant difference -- the core merge is a single threaded operation, so the less time that thread has to wait (for example, to fetch the next cursor of rows) will contribute directly to the overall speed of the query |
76bc1b2
to
aa4b8cb
Compare
🤖 |
🤖: Benchmark completed Details
|
Which issue does this PR close?
56.0.0
(July 2025) arrow-rs#7395Rationale for this change
There are several non trivial changes in arrow 56 so I want to start testing soon
Also, I would like a stable base to test new parquet pushdown code from @XiangpengHao
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?