-
Notifications
You must be signed in to change notification settings - Fork 948
Forward-merge branch-25.06 into branch-25.08 #18646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…13 upgrade (#18593) xref rapidsai/kvikio#702 xref rapidsai/build-planning#120 xref rapidsai/build-planning#171 This PR enables us to update `cudf` to use Python 3.13. We were blocked on upgrading because there are no `nvcomp` wheels for Python 3.13, but we've now vendored `nvcomp` into `libkvikio`, so we should be able to upgrade that way. I've added a new cmake option so that we look for `libnvcomp.so.4` in the right place -- that can be reverted when we switch back to using `nvcomp` wheels. Authors: - Gil Forsyth (https://github.com/gforsyth) Approvers: - James Lamb (https://github.com/jameslamb) - Bradley Dice (https://github.com/bdice) URL: #18593
FAILURE - Unable to forward-merge due to an error, manual merge is necessary. Do not use the IMPORTANT: When merging this PR, do not use the auto-merger (i.e. the |
) By completely removing `GPU_ARCHS` and replacing its occurrences with `CMAKE_CUDA_ARCHITECTURES`, we can avoid mistakes of wrongly using `CMAKE_CUDA_ARCHITECTURES` in place of `GPU_ARCHS` and reduce maintenance. So far, this build variable is only used in Java JNI build. Authors: - Nghia Truong (https://github.com/ttnghia) - James Lamb (https://github.com/jameslamb) Approvers: - James Lamb (https://github.com/jameslamb) - MithunR (https://github.com/mythrocks) URL: #18506
Contributes to #18533 Addresses performance hotspots outlined in #16025 This PR introduces a sort-based approach for inner joins on low-cardinality high-multiplicity tables i.e. tables that have few unique keys each of which is repeated several times. Sort-merge join implemetation: 1. Sort left and right tables using their respective keys. 2. Iterate through the larger of the two tables and compute upper and lower bounds for each key in the smaller table. 3. For left indices, compute the number of elements $n$ in bounds range for each key, and insert the key $n$ times in the array. 4. For right indices, insert the positions between lower and upper bound using the sorted ordering of the smaller table. ### Progress 1. Benchmarking results for join on int64 keys for input tables of varying key multiplicity: [Performance comparison plot](#18318 (comment)) 2. Benchmarking results after optimizing right indices construction: [Profiles and updated benchmarks](#18318 (comment)) TODO: - [x] Inner join on nested columns - [x] Inner join on nullable keys with null equality set to false. - [ ] Remaining join types (left, semi, full, ...) - [X] Merge join for sorted left and right keys Authors: - Shruti Shivakumar (https://github.com/shrshi) - Nghia Truong (https://github.com/ttnghia) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - David Wendt (https://github.com/davidwendt) - Yunsong Wang (https://github.com/PointKernel) - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: #18318
Proposes a simple utility to print the physical plan for each query in `experimental/benchmarks/pdsh.py`. For example, Add the `--explain` flag when calling `pdsh.py`: ``` $ python pdsh.py 13 --path $DATSET_PATH --explain ... Query 13 - Physical plan SORT ('custdist', 'c_count') [1] SELECT ('c_count', 'custdist') [1] SELECT ('c_count', 'len') [1] SELECT ('c_count', '_______0') [1] GROUPBY ('c_count',) [1] REPARTITION [1] GROUPBY ('c_count',) [4] SELECT ('c_custkey', 'c_count') [4] SELECT ('c_custkey', '_________0') [4] GROUPBY ('c_custkey',) [4] SHUFFLE [4] GROUPBY ('c_custkey',) [91] PROJECTION ('c_custkey', 'o_orderkey') [91] JOIN Left ('c_custkey',) ('o_custkey',) [91] SHUFFLE [91] UNION [1 x SCAN ('c_custkey',) customer/customer_00164338-2a8d-4392-9bab-d5259dd47133.parquet ...] SHUFFLE [91] UNION [91 x SCAN ('o_custkey', 'o_orderkey', 'o_comment') orders/orders_00426f3f-530b-4cdd-ae6c-8c9cb046b522.parquet ...] ... ``` We may want a **proper** utility like this outside of `pdsh.py`, but it also seems fine to keep it in the benchmarking file for now. Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Benjamin Zaitlen (https://github.com/quasiben) - Matthew Murray (https://github.com/Matt711) URL: #18635
Closes #18525. This PR removes the duplicated `TypeKind` enum representing Parquet physical types and updates its use with the `Type` enum defined in `parquet_schema.hpp` Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Vukasin Milovanovic (https://github.com/vuule) URL: #18526
This PR makes cuDF cold cache benchmark more rigorous by eliminating the impact of dirty pages before page cache dropping. Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) Approvers: - Muhammad Haseeb (https://github.com/mhaseeb123) - Vukasin Milovanovic (https://github.com/vuule) - MithunR (https://github.com/mythrocks) - Mads R. B. Kristensen (https://github.com/madsbk) URL: #18626
Part of refactoring split/tokenize common code and improvements. This improves performance of the vocabulary-tokenizer mostly for smaller strings (<=128 bytes) by using an intermediate buffer to hold the token-counts before computing offsets. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Shruti Shivakumar (https://github.com/shrshi) URL: #18522
Adds a missing error check for the `nvtext::wordpiece_tokenize` for an invalid argument value. Also fixes the doxygen and removes some commented out code. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: #18621
Follow on to #18607. Found more places using `cudf::size_of()` where `sizeof()` would be faster and more correct. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Paul Mattione (https://github.com/pmattione-nvidia) URL: #18628
#18644) Fixed an issue in the Parquet writer where all compression would fail when device compression with internal kernels is used. Root cause is that a compressed chunk size was not being set, as it's not required for the nvCOMP implementation. Expanded a few tests to use internal kernels for compression; planning to make more extensive test changes to improve test coverage of compression/decompression internal kernels in a separate PR. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Shruti Shivakumar (https://github.com/shrshi) - Nghia Truong (https://github.com/ttnghia) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #18644
…ars (#18638) `Filter` and `Projection` nodes currently "refresh" partitioning information. This isn't necessary, and may result in unnecessary `Shuffle` operations. Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Tom Augspurger (https://github.com/TomAugspurger) URL: #18638
This fixes a deprecated warning in Java JNI code. Closes #18652. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Renjie Liu (https://github.com/liurenjie1024) URL: #18660
This PR disables `arm64` tests to unblock CI. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - James Lamb (https://github.com/jameslamb) - Bradley Dice (https://github.com/bdice) URL: #18662
… function for cudf-polars Column and DataFrame (#18602) This is used by RAPIDSMPF's [`approx_spillable_amount`](https://github.com/rapidsai/rapidsmpf/blob/b77ed56a56a357d4e1f1bcb6208098a8078fb740/python/rapidsmpf/rapidsmpf/integrations/dask/spilling.py#L109). To simplify, moved all the dask registration (incl. the serializers) into `dask_registers.py` Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #18602
Discovered in CCCL CI: https://github.com/NVIDIA/cccl/actions/runs/14806960648/job/41576694188#step:6:2863 Authors: - Allison Piper (https://github.com/alliepiper) - Michael Schellenberger Costa (https://github.com/miscco) - Bradley Dice (https://github.com/bdice) Approvers: - David Wendt (https://github.com/davidwendt) - Yunsong Wang (https://github.com/PointKernel) - Bradley Dice (https://github.com/bdice) URL: #18649
) Contributes to #17896. Part of #18011. This PR implements row group pruning with stats in the experimental Parquet reader optimized for hybrid scan queries Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) - Bradley Dice (https://github.com/bdice) Approvers: - David Wendt (https://github.com/davidwendt) - Vukasin Milovanovic (https://github.com/vuule) - Bradley Dice (https://github.com/bdice) URL: #18543
Wrap `cudf_polars.DataFrame` in a container that enables spilling using rapidsmp. Authors: - Mads R. B. Kristensen (https://github.com/madsbk) - Richard (Rick) Zamora (https://github.com/rjzamora) - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) - Richard (Rick) Zamora (https://github.com/rjzamora) - Gil Forsyth (https://github.com/gforsyth) URL: #18461
Fixes invalid dereference of a `cuda::std::optional` with no value. The new CCCL version has assert checks for this which show up in a Debug build. Error appears like this when running `gtests/CLAMP_TEST --gtest_filter=ClampTestNumeric/0.InputNull`: ``` /cudf/cpp/build/_deps/cccl-src/lib/cmake/libcudacxx/../../../libcudacxx/include/cuda/std/detail/libcxx/include/optional:867: operator*: block: [0,0,0], thread: [0,0,0] Assertion `optional operator* called on a disengaged value` failed. ``` Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #18655
Forward-merge triggered by push to branch-25.06 that creates a PR to keep branch-25.08 up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge. See forward-merger docs for more info.