Forward-merge branch-25.06 into branch-25.08 #18646

rapids-bot · 2025-05-02T19:08:08Z

Forward-merge triggered by push to branch-25.06 that creates a PR to keep branch-25.08 up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge. See forward-merger docs for more info.

…13 upgrade (#18593) xref rapidsai/kvikio#702 xref rapidsai/build-planning#120 xref rapidsai/build-planning#171 This PR enables us to update `cudf` to use Python 3.13. We were blocked on upgrading because there are no `nvcomp` wheels for Python 3.13, but we've now vendored `nvcomp` into `libkvikio`, so we should be able to upgrade that way. I've added a new cmake option so that we look for `libnvcomp.so.4` in the right place -- that can be reverted when we switch back to using `nvcomp` wheels. Authors: - Gil Forsyth (https://github.com/gforsyth) Approvers: - James Lamb (https://github.com/jameslamb) - Bradley Dice (https://github.com/bdice) URL: #18593

rapids-bot · 2025-05-02T19:08:11Z

FAILURE - Unable to forward-merge due to an error, manual merge is necessary. Do not use the Resolve conflicts option in this PR, follow these instructions https://docs.rapids.ai/maintainers/forward-merger/

IMPORTANT: When merging this PR, do not use the auto-merger (i.e. the /merge comment). Instead, an admin must manually merge by changing the merging strategy to Create a Merge Commit. Otherwise, history will be lost and the branches become incompatible.

) By completely removing `GPU_ARCHS` and replacing its occurrences with `CMAKE_CUDA_ARCHITECTURES`, we can avoid mistakes of wrongly using `CMAKE_CUDA_ARCHITECTURES` in place of `GPU_ARCHS` and reduce maintenance. So far, this build variable is only used in Java JNI build. Authors: - Nghia Truong (https://github.com/ttnghia) - James Lamb (https://github.com/jameslamb) Approvers: - James Lamb (https://github.com/jameslamb) - MithunR (https://github.com/mythrocks) URL: #18506

Contributes to #18533 Addresses performance hotspots outlined in #16025 This PR introduces a sort-based approach for inner joins on low-cardinality high-multiplicity tables i.e. tables that have few unique keys each of which is repeated several times. Sort-merge join implemetation: 1. Sort left and right tables using their respective keys. 2. Iterate through the larger of the two tables and compute upper and lower bounds for each key in the smaller table. 3. For left indices, compute the number of elements $n$ in bounds range for each key, and insert the key $n$ times in the array. 4. For right indices, insert the positions between lower and upper bound using the sorted ordering of the smaller table. ### Progress 1. Benchmarking results for join on int64 keys for input tables of varying key multiplicity: [Performance comparison plot](#18318 (comment)) 2. Benchmarking results after optimizing right indices construction: [Profiles and updated benchmarks](#18318 (comment)) TODO: - [x] Inner join on nested columns - [x] Inner join on nullable keys with null equality set to false. - [ ] Remaining join types (left, semi, full, ...) - [X] Merge join for sorted left and right keys Authors: - Shruti Shivakumar (https://github.com/shrshi) - Nghia Truong (https://github.com/ttnghia) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - David Wendt (https://github.com/davidwendt) - Yunsong Wang (https://github.com/PointKernel) - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: #18318

Proposes a simple utility to print the physical plan for each query in `experimental/benchmarks/pdsh.py`. For example, Add the `--explain` flag when calling `pdsh.py`: ``` $ python pdsh.py 13 --path $DATSET_PATH --explain ... Query 13 - Physical plan SORT ('custdist', 'c_count') [1] SELECT ('c_count', 'custdist') [1] SELECT ('c_count', 'len') [1] SELECT ('c_count', '_______0') [1] GROUPBY ('c_count',) [1] REPARTITION [1] GROUPBY ('c_count',) [4] SELECT ('c_custkey', 'c_count') [4] SELECT ('c_custkey', '_________0') [4] GROUPBY ('c_custkey',) [4] SHUFFLE [4] GROUPBY ('c_custkey',) [91] PROJECTION ('c_custkey', 'o_orderkey') [91] JOIN Left ('c_custkey',) ('o_custkey',) [91] SHUFFLE [91] UNION [1 x SCAN ('c_custkey',) customer/customer_00164338-2a8d-4392-9bab-d5259dd47133.parquet ...] SHUFFLE [91] UNION [91 x SCAN ('o_custkey', 'o_orderkey', 'o_comment') orders/orders_00426f3f-530b-4cdd-ae6c-8c9cb046b522.parquet ...] ... ``` We may want a **proper** utility like this outside of `pdsh.py`, but it also seems fine to keep it in the benchmarking file for now. Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Benjamin Zaitlen (https://github.com/quasiben) - Matthew Murray (https://github.com/Matt711) URL: #18635

Closes #18525. This PR removes the duplicated `TypeKind` enum representing Parquet physical types and updates its use with the `Type` enum defined in `parquet_schema.hpp` Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Vukasin Milovanovic (https://github.com/vuule) URL: #18526

This PR makes cuDF cold cache benchmark more rigorous by eliminating the impact of dirty pages before page cache dropping. Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) Approvers: - Muhammad Haseeb (https://github.com/mhaseeb123) - Vukasin Milovanovic (https://github.com/vuule) - MithunR (https://github.com/mythrocks) - Mads R. B. Kristensen (https://github.com/madsbk) URL: #18626

Part of refactoring split/tokenize common code and improvements. This improves performance of the vocabulary-tokenizer mostly for smaller strings (<=128 bytes) by using an intermediate buffer to hold the token-counts before computing offsets. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Shruti Shivakumar (https://github.com/shrshi) URL: #18522

Adds a missing error check for the `nvtext::wordpiece_tokenize` for an invalid argument value. Also fixes the doxygen and removes some commented out code. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: #18621

Follow on to #18607. Found more places using `cudf::size_of()` where `sizeof()` would be faster and more correct. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Paul Mattione (https://github.com/pmattione-nvidia) URL: #18628

#18644) Fixed an issue in the Parquet writer where all compression would fail when device compression with internal kernels is used. Root cause is that a compressed chunk size was not being set, as it's not required for the nvCOMP implementation. Expanded a few tests to use internal kernels for compression; planning to make more extensive test changes to improve test coverage of compression/decompression internal kernels in a separate PR. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Shruti Shivakumar (https://github.com/shrshi) - Nghia Truong (https://github.com/ttnghia) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #18644

…ars (#18638) `Filter` and `Projection` nodes currently "refresh" partitioning information. This isn't necessary, and may result in unnecessary `Shuffle` operations. Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Tom Augspurger (https://github.com/TomAugspurger) URL: #18638

This fixes a deprecated warning in Java JNI code. Closes #18652. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Renjie Liu (https://github.com/liurenjie1024) URL: #18660

This PR disables `arm64` tests to unblock CI. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - James Lamb (https://github.com/jameslamb) - Bradley Dice (https://github.com/bdice) URL: #18662

… function for cudf-polars Column and DataFrame (#18602) This is used by RAPIDSMPF's [`approx_spillable_amount`](https://github.com/rapidsai/rapidsmpf/blob/b77ed56a56a357d4e1f1bcb6208098a8078fb740/python/rapidsmpf/rapidsmpf/integrations/dask/spilling.py#L109). To simplify, moved all the dask registration (incl. the serializers) into `dask_registers.py` Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #18602

Discovered in CCCL CI: https://github.com/NVIDIA/cccl/actions/runs/14806960648/job/41576694188#step:6:2863 Authors: - Allison Piper (https://github.com/alliepiper) - Michael Schellenberger Costa (https://github.com/miscco) - Bradley Dice (https://github.com/bdice) Approvers: - David Wendt (https://github.com/davidwendt) - Yunsong Wang (https://github.com/PointKernel) - Bradley Dice (https://github.com/bdice) URL: #18649

) Contributes to #17896. Part of #18011. This PR implements row group pruning with stats in the experimental Parquet reader optimized for hybrid scan queries Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) - Bradley Dice (https://github.com/bdice) Approvers: - David Wendt (https://github.com/davidwendt) - Vukasin Milovanovic (https://github.com/vuule) - Bradley Dice (https://github.com/bdice) URL: #18543

Wrap `cudf_polars.DataFrame` in a container that enables spilling using rapidsmp. Authors: - Mads R. B. Kristensen (https://github.com/madsbk) - Richard (Rick) Zamora (https://github.com/rjzamora) - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) - Richard (Rick) Zamora (https://github.com/rjzamora) - Gil Forsyth (https://github.com/gforsyth) URL: #18461

Fixes invalid dereference of a `cuda::std::optional` with no value. The new CCCL version has assert checks for this which show up in a Debug build. Error appears like this when running `gtests/CLAMP_TEST --gtest_filter=ClampTestNumeric/0.InputNull`: ``` /cudf/cpp/build/_deps/cccl-src/lib/cmake/libcudacxx/../../../libcudacxx/include/cuda/std/detail/libcxx/include/optional:867: operator*: block: [0,0,0], thread: [0,0,0] Assertion `optional operator* called on a disengaged value` failed. ``` Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #18655

rapids-bot bot requested review from a team as code owners May 2, 2025 19:08

rapids-bot bot requested a review from msarahan May 2, 2025 19:08

github-actions bot added Python Affects Python cuDF API. CMake CMake build issue labels May 2, 2025

github-project-automation bot added this to cuDF Python May 2, 2025

rapids-bot bot requested a review from a team as a code owner May 2, 2025 20:38

github-actions bot added the Java Affects Java cuDF API. label May 2, 2025

rapids-bot bot requested a review from a team as a code owner May 2, 2025 21:04

rapids-bot bot requested review from ttnghia and lamarrr May 2, 2025 21:04

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label May 2, 2025

rapids-bot bot requested a review from a team as a code owner May 2, 2025 21:33

rapids-bot bot requested review from bdice and Matt711 May 2, 2025 21:33

github-actions bot added the cudf.polars Issues specific to cudf.polars label May 2, 2025

mhaseeb123 and others added 8 commits May 3, 2025 00:24

Fix compile warnings in Java JNI (#18660)

d0c1300

This fixes a deprecated warning in Java JNI code. Closes #18652. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Renjie Liu (https://github.com/liurenjie1024) URL: #18660

galipremsagar and others added 2 commits May 6, 2025 05:19

Disable arm64 python tests (#18662)

c7bc8fd

This PR disables `arm64` tests to unblock CI. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - James Lamb (https://github.com/jameslamb) - Bradley Dice (https://github.com/bdice) URL: #18662

github-actions bot added the pylibcudf Issues specific to the pylibcudf package label May 6, 2025

alliepiper and others added 4 commits May 6, 2025 11:36

davidwendt mentioned this pull request May 6, 2025

Fix merge conflict: branch-25.06 into branch-25.08 #18668

Merged

AyodeAwe merged commit 8b76a46 into branch-25.08 May 6, 2025
26 of 28 checks passed

github-project-automation bot moved this to Done in cuDF Python May 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Forward-merge branch-25.06 into branch-25.08 #18646

Forward-merge branch-25.06 into branch-25.08 #18646

Uh oh!

rapids-bot bot commented May 2, 2025

Uh oh!

rapids-bot bot commented May 2, 2025

Uh oh!

Uh oh!

Uh oh!

Forward-merge branch-25.06 into branch-25.08 #18646

Forward-merge branch-25.06 into branch-25.08 #18646

Uh oh!

Conversation

rapids-bot bot commented May 2, 2025

Uh oh!

rapids-bot bot commented May 2, 2025

Uh oh!

Uh oh!

Uh oh!