Releases · rapidsai/cudf

05 Jun 17:31

raydouglass

v25.06.00

84c4350

v25.06.00 Latest

Latest

🚨 Breaking Changes

Remove cudf.BaseIndex (#18751) @mroeschke
Implement BIT_COUNT unary operation (#18589) @ttnghia
Expose column chunk metadata in read_parquet_metadata() (#18579) @mhaseeb123
Fix overflow for MERGE_M2 groupby aggregation (#18546) @ttnghia
Deduplicate parquet physical type enums (#18526) @mhaseeb123
Implemented String Output & User-data Support for Transforms (#18490) @lamarrr
Promote Parquet type enums to enum classes (#18441) @mhaseeb123
Move parquet schema types and structs to public headers (#18424) @mhaseeb123
Start removal of vector factories with _sync suffix by deprecating them and adding versions without the suffix (#18414) @vuule
Skip decoding of pages marked as pruned in PQ reader (#18347) @mhaseeb123
Deprecate nvtext subword tokenizer (#18334) @davidwendt
Add standard data ingestion pipelines to pylibcudf for ndarrays (#18311) @Matt711
Remove extranous modules from top level cudf namespace (#18287) @mroeschke
Add Keep Option Parameter to Distinct (#18237) @warrickhe
Update to CCCL 2.8.x with no CCCL patches (#18235) @bdice

🐛 Bug Fixes

Disable pytest benchmark for Narwhals CI job (#19074) @Matt711
Avoid undefined behaviour in rolling_store_output_functor (#19069) @wence-
Filter out pkg_resources UserWarning to make nightly CI pass (#19058) @Matt711
Pin deltalake to <1.0.0 (#19017) @Matt711
[BUG] Incorrectly getting the caller's frame when searching for locals and globals in cudf.pandas (#18979) @Matt711
Ensure gc fixture is used in custreamz test (#18915) @TomAugspurger
Fix a potential segfault in PQ reader's number of rows per source calculation (#18906) @mhaseeb123
Fix Dataframe getitem when MultiIndex columns exist (#18880) @galipremsagar
Ensure eq/ne between Columns in public objects don't return bool (#18875) @mroeschke
Fix fencepost error in Repartition task generation (#18854) @wence-
Fix cudf_polars pl.col(...).len() always excluding null values (#18849) @mroeschke
Throw a descriptive exception in Parquet reader when trying to read files with more than two billion rows (#18835) @mhaseeb123
Skip a decompression test (#18825) @vuule
Update strings benchmarks to use alloc_size column/table function (#18822) @davidwendt
Fix host decompression of empty DEFLATE data (#18805) @vuule
Avoid going OOM in test_row_limit_exceed_raises by using dummy array (#18802) @Matt711
Fix host decompression of empty Snappy data (#18800) @vuule
Skip test that fails due to polars issue (#18787) @wence-
Ensure scalar dtype is always set in from_py (#18780) @vyasr
Fix reading of Snappy compressed Avro files (#18774) @vuule
Fix missing semicolon in label_bins.cu (#18765) @evanramos-nvidia
Fix noexcept annotations on strings_column_view (#18763) @wence-
Fix integer overflows in pylibcudf from_column_view_of_arbitrary (#18758) @wence-
Fix overflow case and clean up some logic (#18734) @vyasr
Link to nvtx3::nvtx3-cpp instead of nvToolsExt (#18730) @jakirkham
Revise DaskIntegration protocol to align with rapidsmpf (#18720) @rjzamora
Fix skip_compression option in the Parquet writer with host compression (#18714) @vuule
Add missing header (#18671) @vyasr
Revert "Set flag to always use unsafe atomic storage" (#18657) @PointKernel
Fix optional operator* called on a disengaged value in clamp.cu (#18655) @davidwendt
Add missing header to host_memory.cpp (#18649) @alliepiper
Fix device compression when writing Parquet files without using nvCOMP (#18644) @vuule
Add CUDA_ARCHITECTURES setting to cpp-linters script (#18637) @davidwendt
Pin to cython<3.1 (#18617) @wence-
Fix DataFrame.memory_usage output order (#18595) @mroeschke
Set flag to always use unsafe atomic storage (#18590) @PointKernel
Update KvikIO S3 endpoint usage (#18565) @kingcrimsontianyu
Skip cuml third-party integration tests that may segfault (#18561) @Matt711
Allow .iloc with cuDF objects as column indexers (#18558) @mroeschke
Fix overflow for MERGE_M2 groupby aggregation (#18546) @ttnghia
Add back cudf root (#18544) @vyasr
Change default memory resource for 'distributed' cudf-polars (#18531) @rjzamora
Fix copy-on-write buffer separation and cleanup (#18530) @galipremsagar
Fix cpp examples cmake to use the rapids_config.cmake (#18501) @davidwendt
Rename rapidsmp to rapidsmpf (#18493) @rjzamora
Fix compilation with the C++20 standard (#18486) @vuule
Fix an error when reading some compressed Parquet V2 files (#18478) @vuule
Support title-case characters in strings capitalize() and title() APIs (#18457) @davidwendt
Ensure DataFrame column label operations reset label_dtype (#18452) @mroeschke
Fix a segfault when reading a Parquet file with unsupported compression type (#18451) @vuule
Fix logger macros (#18444) @vyasr
Fix auto-detection of compression type in host-side decompression (#18440) @shrshi
Use delete not free to release data allocated with new (#18412) @wence-
Fix synchronization issues in host compression and decompression (#18395) @vuule
Update Dask array-conversion handling (#18382) @rjzamora
Fixed indexing on empty DataFrame with no columns (#18381) @TomAugspurger
Deterministic hashing for DataFrameScan nodes in cudf-polars multi-partition executor (#18351) @TomAugspurger
Fix index of right table in unary operators in AST, in Joins (#18333) @karthikeyann
Add offsetalator to contiguous-split (#18312) @davidwendt
Support large strings in nvtext vocabulary-tokenizer (#18283) @davidwendt
Handle empty aggregations in multi-partition cudf.polars group_by (#18277) @TomAugspurger

📖 Documentation

Docs for streaming executor options (#18934) @quasiben
Fix some duplicate toctree issues and improve groupby docs (#18580) @vyasr
[DOC] Running libcudf benchmarks and comparing output results (#18548) @Matt711
Fix doxygen usage of the contraction for it is (#18517) @davidwendt
Clarify @brief tag as description/title on documentation guide (#18515) @davidwendt
[DOC] Improve clarity in parquet APIs set_row_groups and set_columns parquet (#18466) @Matt711
Add a usage page to cudf-polars documentation (#18460) @Matt711
[DOC] Fix typo in CONTRIBUTING.md on build type tests (#18456) @JigaoLuo
improve docs related to documentation contribution (#18418) @ncclementi
Add restart kernel note in cudf pandas docs (#18374) @ncclementi

🚀 New Features

Add CLI argument to enable RMM async memory resource in PDS-H (#18899) @pentschev
Scan a headerless CSV file with column names provided (#18816) @Matt711
Add fast paths for DataFrame.to_cupy (#18801) @Matt711
Require numba-cuda>=0.11.0 (#18770) @brandon-b-miller
Create a pylibcudf Column from a python iterable (#18768) @Matt711
Support ConditianalJoin via broadcasting in cudf-polars streaming engine (#18723) @rjzamora
Experimental PQ reader utility to calculate total rows in input row groups (#18716) @mhaseeb123
Extend explain_query to support printing the logical plan (pre lowered plan) (#18708) @Matt711
Reuse libcudf dependencies for Java JNI build when they are available (#18682) @ttnghia
Add alloc_size member function to cudf::column and cudf::table (#18639) @davidwendt
Print the physical cudf-polars plan in pdsh.py (#18635) @rjzamora
String Transform Examples (#18616) @lamarrr
Add streaming support for group_by -> n_unique to cudf-polars (#18606) @rjzamora
Export cudf compiler flags and definitions (#18604) @ttnghia
Implement BIT_COUNT unary operation (#18589) @ttnghia
Expose column chunk metadata in read_parquet_metadata() (#18579) @mhaseeb123
Add APIs to check ORC and Parquet compression support at runtime (#18578) @vuule
Add Distinct support to the cudf-polars streaming executor (#18576) @rjzamora
Add support for large list host Arrow data conversion (#18562) @vyasr
Implement BITWISE_AGG aggregations (bitwise AND, OR and XOR) for sort-based groupby and reduction (#18551) @ttnghia
Implement row group pruning with bloom filters in experimental PQ reader (#18545) @mhaseeb123
Implement row group pruning with stats in experimental PQ reader (#18543) @mhaseeb123
[JNI] Expose row-wise sha1 api (#18540) @warrickhe
Add Sort + head/tail support to streaming cudf-polars executor (#18538) @rjzamora
Add multi-partition MapFunction support to cudf-polars (#18523) @rjzamora
Adds support for writing raw UTF-8 characters (without escaping) in the JSON writer (#18508) @Matt711
Support reading from device buffers in the pylibcudf IO APIs (#18496) @Matt711
Support multi-partition Select operations with aggregations (#18492) @rjzamora
Implemented String Output & User-data Support for Transforms (#18490) @lamarrr
Add a utility to bulk set multiple null masks (#18489) @mhaseeb123
High level interface for experimental PQ reader and implementation of metadata APIs (#18480) @mhaseeb123
Added pylibcudf.utilities.is_ptds_enabled (#18467) @TomAugspurger
Add a public API for copying a table_view to device array (#18450) @Matt711
Support cudf-polars cast_time_unit (#18442) @brandon-b-miller
Support creating a pylibcudf Column from a host array (#18425) @Matt711
Move parquet schema types and structs to public headers (#18424) @mhaseeb123
Add optional dtype argument to Scalar.from_any (#18415) @Matt711
Expose cudf::chunked_pack in pylibcudf (#18411) @wence-
Add support for long string columns in cudf::contiguous_split (#18393) @nvdbaranec
Implemented String Input support for Transforms and Removed jit::column_device_view (#18378) @lamarrr
Automatically dispatch between host and device decompression/compression based on the number of buffers (#18363) @vuule
Expose join hash table load factor (#18361) @PointKernel
Skip decoding of pages marked as pruned in PQ reader (#18347) @mhaseeb123
Sort-based inner join for high-multiplic...

Contributors

alliepiper, brief, and 40 other contributors

Assets 2

22 May 21:42

rapids-bot

v25.08.00a

cd75a98

[NIGHTLY] v25.08.00 Pre-release

Pre-release

🔗 Links

🚨 Breaking Changes

Remove CUDA 11 from dependencies.yaml (#19139) @KyleFromNVIDIA
Temporarily revert "Refactor JNI error handling (#18983)" (#19076) @abellina
Rename parquet_chunked_writer to chunked_parquet_writer for consistency with the reader (#19047) @mhaseeb123
Compile libcudf using C++20 Standard (#19045) @vuule
Refactor JNI error handling (#18983) @ttnghia
stop uploading packages to downloads.rapids.ai (#18973) @jameslamb
Remove deprecated Series methods, isclose (#18947) @mroeschke
Remove deprecated groupby.collect (#18946) @mroeschke
Remove deprecated get_dummies(cats=, ...) (#18944) @mroeschke
Add pylibcudf.Column.from_arrow factory method (#18937) @Matt711
Add pylibcudf.Table.from_arrow factory method (#18936) @Matt711
Remove deprecated APIs (#18933) @vuule
Remove cudf.Scalar (#18927) @mroeschke
Remove deprecated cudf::io::host_buffer (#18881) @Matt711
Null-handling for Transforms (#18845) @lamarrr
Enable skip_rows in the chunked parquet reader. (#18130) @mhaseeb123

🐛 Bug Fixes

Fix hash collision in Union([MapFunction]) (#19124) @TomAugspurger
Fix bug in group_by().n_unique() in streaming cudf-polars (#19108) @rjzamora
Fix cudf_polars spilling (#19101) @TomAugspurger
Fix libcudf strings case logic to set null-row size to zero (#19095) @davidwendt
Temporarily revert "Refactor JNI error handling (#18983)" (#19076) @abellina
Temporary workaround for incorrect SplitScan results in cuDF-Polars (#19071) @rjzamora
Use default memory resource for JSON_QUOTE_NORMALIZATION gtests (#19057) @davidwendt
Added null-probability to polynomial benchmarks and fixed transform call-sites (#18972) @lamarrr
Fix flaky custreamz test (#18961) @TomAugspurger
Fix tdigest percentile correctness for low row-counts (#18952) @mythrocks
Enable skip_rows in the chunked parquet reader. (#18130) @mhaseeb123

📖 Documentation

Update README and CONTRIBUTING to reflect new CUDA requirements (#19138) @PointKernel
Remove the extra index URL for CUDA 12 (#19128) @vyasr
Improve WordPieceVocabulary.tokenize documentation (#19098) @davidwendt
Update the contributing guide to include pylibcudf in the build command (#19011) @Matt711
Fix pylibcudf docs for some strings APIs (#19004) @davidwendt
Update cuDF Python library design with BaseIndex and pylibcudf updates (#18903) @mroeschke

🚀 New Features

Support cudf-polars str.head and str.tail (#19115) @brandon-b-miller
Support cudf-polars str.to_titlecase (#19114) @brandon-b-miller
Move the remaining libcudf pieces to C++20 (#19065) @vuule
Allow using a stream per thread at runtime (#19051) @vyasr
Compile libcudf using C++20 Standard (#19045) @vuule
Refactor JNI error handling (#18983) @ttnghia
Add basic Sink support for streaming cudf-polars executor (#18963) @rjzamora
Add from_arrow factory methods for Scalar and DataType (#18938) @Matt711
Add pylibcudf.Column.from_arrow factory method (#18937) @Matt711
Add pylibcudf.Table.from_arrow factory method (#18936) @Matt711
Update nvCOMP adapter (#18931) @vuule
Create a pylibcudf Column from a iterable of python strings (#18916) @Matt711
Add CLI argument to enable OOM protection in PDS-H (#18914) @pentschev
Null-handling for Transforms (#18845) @lamarrr
Add support for parquet scan + count operation (#18463) @Matt711

🛠️ Improvements

Remove CUDA 11 from dependencies.yaml (#19139) @KyleFromNVIDIA
Move Accessor implementation to their own directory (#19134) @mroeschke
Add benchmarks for sorting float and timestamp (#19133) @davidwendt
Move pdsh utility functions/classes to a seperate module (#19126) @Matt711
Add validate arg to polars pdsh benchmarks (#19121) @Matt711
Share Index.values with base implementaiton (#19112) @mroeschke
Use len instead of len(obj.some_attribute) (#19111) @mroeschke
Raise EmptyDataError in pandas-compat mode for empty read_csv (#19109) @mroeschke
Use cooperative-groups for warp-parallel kernels in nvtext (#19107) @davidwendt
Avoid O(n) lookup when creating cuDF Python mixins (#19104) @mroeschke
Update cudf to accommodate breaking changes in cuCollections (#19093) @PointKernel
Forward-merge branch-25.06 to branch-25.08 (#19087) @Matt711
Optimize tokenization for dask task graphs in cudf-polars (#19083) @TomAugspurger
Update mypy configuration to check against polars (#19072) @TomAugspurger
[cudf-polars] Update rapidsmpf import paths (#19068) @madsbk
Fix clang-tidy modernize-use-integer-sign-comparison rule (#19066) @vuule
[cudf-polars] Use RapidsMPF's config options (#19059) @madsbk
Unskip narwhals tests for cudf-polars run (#19056) @Matt711
Remove unnecessary synchronization (miss-sync) during Parquet reading (Part 1: device_scalar) (#19055) @JigaoLuo
Part 1/2: Refactor PQ reader chunking utilities for reuse in hybrid scan (#19054) @mhaseeb123
Swap cuda::std::distance for thrust::distance (#19050) @vyasr
Rename parquet_chunked_writer to chunked_parquet_writer for consistency with the reader (#19047) @mhaseeb123
Add pylibcudf.Scalar.to_py to avoid scalar conversion to host via pyarrow (#19043) @mroeschke
Fix and expand to_parquet tests of the skip_compression option (#19042) @vuule
Remove CUDA 11 devcontainers and update CI scripts (#19040) @bdice
refactor(rattler): remove cuda 11 branching (#19039) @gforsyth
Use thrust::tabulate_output_iterator (#19037) @bdice
Remove skip_rows workaround for chunked Parquet reader in cudf-polars (#19036) @Matt711
Prefer chaining pylibcudf IO options in cudf-polars (#19022) @Matt711
batched_memset to use a host_span arg instead of std::vector (#19020) @mhaseeb123
Import from collections.abc for consistent typing/runing access (#19019) @mroeschke
Avoid using cudf module for type annotations (#19018) @mroeschke
Mark pandas unit test test_eval_no_support_column_name as xpassing (#19016) @mroeschke
Unify Frame._split and DataFrame.scatter_by_map/partition_by_hash implementations (#19013) @mroeschke
Move IndexedFrame.memory_usage docstrings to DataFrame/Series, make RangeIndex methods consistent with base class (#19010) @mroeschke
Share DataFrame/Series.(de)seralize methods, implement to_dlpack directly on Frame (#19008) @mroeschke
Pin narhwals to 1.41 (#19007) @Matt711
Add year range check to cudf::strings::is_timestamp (#19006) @davidwendt
Add cudf::strings::contains_multiple to pylibcudf (#19003) @davidwendt
Avoid unnecessary partition step in streaming join (#19002) @rjzamora
Part 2/n: Use cooperative groups in PQ decoders (#18978) @mhaseeb123
Move libcudf copying benchmarks to nvbench (#18976) @davidwendt
Add lag/lead/bitwise/row_number aggregations to pylibcudf (#18975) @mroeschke
Switch to importing rather than cimporting datetime (#18974) @vyasr
stop uploading packages to downloads.rapids.ai (#18973) @jameslamb
Trace IR.do_evaluate in cudf_polars (#18970) @TomAugspurger
xfail more pandas unit tests that fail with cudf.pandas before execution instead of xfailing after execution (#18965) @mroeschke
Remove test checks that depend on the compression engine (#18960) @vuule
Use cooperative-groups for warp-parallel kernels in strings functions (#18959) @davidwendt
fetch code before running pull request labeler (#18958) @jameslamb
Use cooperative groups in parquet decoder kernels (#18954) @mhaseeb123
Add a DataType container in cudf_polars (#18953) @mroeschke
add 'rapids-init-pip' to test_cudf_polars_polars_tests.sh (#18951) @jameslamb
parameterized ucx / ucxx (#18949) @quasiben
Rework cudf::sorted_order implementation for faster compile (#18948) @davidwendt
Remove deprecated Series methods, isclose (#18947) @mroeschke
Remove deprecated groupby.collect (#18946) @mroeschke
Remove deprecated get_dummies(cats=, ...) (#18944) @mroeschke
Add .python_typecode and .typestr attributes to DataType (#18941) @Matt711
Remove deprecated APIs (#18933) @vuule
Remove cudf.Scalar (#18927) @mroeschke
Add #pragma once to prevent redundant includes and speed up compilation (#18925) @PointKernel
Bump polars version to <1.31 (#18920) @Matt711
Branch 25.08 merge branch 25.06 (#18895) @vyasr
Remove deprecated cudf::io::host_buffer (#18881) @Matt711
Fix decompression scratch size in AUTO mode (#18878) @vuule
Apply linter suggestions to cuIO code (#18876) @vuule
xfail pandas unit tests that fail with cudf.pandas (#18872) @mroeschke
Branch 25.08 merge branch 25.06 (#18855) @vyasr
Add support for extended dtypes in cudf.pandas (#18832) @galipremsagar
Auto merge fix for branch-25.08 (#18824) @davidwendt
Forward-merge branch-25.06 to branch-25.08 (#18817) @Matt711
Forward-merge branch-25.06 to branch-25.08 (#18756) @Matt711
Fix auto merge conflict for branch-25.08 (#18733) @davidwendt
Forward-merge branch-25.06 to branch-25.08 (#18698) @Matt711
Fix merge conflict for auto-merger 25.06 to 25.08 (#18693) @davidwendt
Fix merge conflict: branch-25.06 into branch-25.08 (#18668) @davidwendt
Make cuda12 as JNI default (#18651) @pxLi
Forward-merge branch-25.06 into branch-25.08 (#18647) @bdice
Fix merge branch-25.06 into branch-25.08 (#18622) @davidwendt
Refactor strings split/record with whitespace logic (#18560) @davidwendt

Contributors

madsbk, TomAugspurger, and 22 other contributors

Assets 2

09 Apr 18:14

AyodeAwe

v25.04.00

6bc4206

v25.04.00

🚨 Breaking Changes

Remove unused group_range_rolling_window API (#18313) @wence-
[BUG] Disabled JIT for CUDA Runtime < 11.5 (#18296) @lamarrr
Remove cudf.Scalar from binops (#18240) @mroeschke
Enforce deprecation of dtype parameter in sum/product (#18070) @mroeschke
Remove deprecated single component datetime extract APIs (#18010) @Matt711
Remove deprecated rolling window functionality (#17993) @wence-
Remove deprecated nvtext::minhash_permuted APIs (#17939) @davidwendt
Remove dataframe protocol (#17909) @vyasr
Use new rapids-logger library (#17899) @vyasr
Added Multi-input & Scalar Support for Transform UDFs (#17881) @lamarrr
Fixed incorrect PTX parsing of ret instruction after branch label (#17859) @lamarrr
Use KvikIO to enable file's fast host read and host write (#17764) @kingcrimsontianyu

🐛 Bug Fixes

Fix alpha versions of cudf package. (#18429) @bdice
Backport: Deterministic hashing for DataFrameScan nodes in cudf-polars multi-partition executor (#18351) (#18420) @bdice
Skip failing Narwhals rolling groupy tests (#18398) @Matt711
Pin cmake in test_java to be less than 4.0.0 (#18392) @abellina
Skip polars tests that fail with pydantic deprecation warnings (#18388) @Matt711
Backport: Fix index of right table in unary operators in AST, in Joins (#18342) @bdice
xfail narwhals sqlframe tests (#18297) @Matt711
[BUG] Disabled JIT for CUDA Runtime < 11.5 (#18296) @lamarrr
Make a pylibcudf Column from a device array object with strides=None (#18295) @Matt711
Fix cudf.pandas objects to not be Callable (#18288) @galipremsagar
Skip failing polars test test_general_prefiltering (#18264) @Matt711
Filter all cudf.pandas profiler tests from running in parallel (#18262) @Matt711
Allow cudf.Series([pd.NA], dtype=, nan_as_null=False) (#18259) @mroeschke
Fix cross join with extra columns (#18256) @galipremsagar
Fix Dataframe.loc to not modify the actual dataframe (#18254) @galipremsagar
Remove RMM macro usage from to_arrow_device.cu (#18252) @davidwendt
Skip Narwhals cross join tests for cudf.pandas CI run (#18249) @Matt711
Fix cudf-polars tests for polars < 1.24 (#18246) @wence-
Fix experimental cudf-polars tests (#18244) @rjzamora
Fix datetime64 vs datetime binops max resolution (#18241) @galipremsagar
Use CCCL::libcudacxx include directories in Jitify preprocessing. (#18233) @bdice
Disable conda prefix patching to avoid mangling binaries (#18225) @vyasr
Workaround for ARM compiler issue with single space literal string (#18220) @davidwendt
Bump nightly check limit (#18213) @Matt711
Support comparitive binops between catgorical and non categorical (#18200) @mroeschke
Make the version file inside cudf.pandas not a symlink (#18198) @vyasr
Ensure RAPIDS_ARTIFACTS_DIR is set for build metrics reports. (#18192) @bdice
Ignore run exports of libcufile. (#18190) @bdice
Skip flaky multi GPU test (#18187) @Matt711
Fix BPE merges table static-map capacity size (#18184) @davidwendt
Drop CUB_QUOTIENT_CEILING (#18179) @miscco
Disable ARM CI in C++ and Python test CI jobs (#18175) @Matt711
Add fmt to the test/benchmarks env (#18173) @vyasr
Fix merge(how=left, left_on=, right_index=True, sort=True) (#18166) @mroeschke
Allow nonnative cupy dtype in cudf.Series (#18164) @mroeschke
Fix Series construction from numpy array with non-native byte order (#18151) @mroeschke
Use protocol for dlpack instead of deprecated function in cupy notebook (#18147) @Matt711
Skip failing test (#18146) @vyasr
Update calls to KvikIO's config setter (#18144) @kingcrimsontianyu
Reduce memory use when writing tables with very short columns to ORC (#18136) @vuule
Handle empty dictionary in to_arrow_device interop (#18121) @davidwendt
Allow pivot_table to accept single label index and column arguments (#18115) @mroeschke
Preserve DataFrame.column subclass and type during binop (#18113) @mroeschke
Fix rmm macro call (#18108) @pmattione-nvidia
Add include for <functional> (#18102) @miscco
Remove static column vectors from window function tests. (#18099) @mythrocks
Fix scatter_by_map with spilling enabled (#18095) @mroeschke
Use the right version macro CCCL_MAJOR_VERSION (#18073) @miscco
Fix test_scan_csv_multi cudf-polars test (#18064) @rjzamora
Fix memcopy direction for concatenate (#18058) @tgujar
Fix upstream dask loc test (#18045) @rjzamora
Fix hang on invalid UTF-8 data in string_view iterator (#18039) @davidwendt
Fix dask_cudf.to_orc deprecation (#18038) @rjzamora
Compatibility with dask.dataframe's is_scalar (#18030) @TomAugspurger
Fix the build error due to KvikIO update (#18025) @kingcrimsontianyu
Fix failing ibis test (#18022) @Matt711
Skip failing polars tests (#18015) @Matt711
Fix to_arrow to return consistent pandas-metadata (#18009) @galipremsagar
Prevent setting custom attributes to ColumnMethods (#18005) @galipremsagar
Compatibility with Dask main (#17992) @TomAugspurger
[Bug] Fix Parquet-metadata sampling in cudf-polars (#17991) @rjzamora
Add missing include for calling std::iota() (#17983) @davidwendt
Fix pickle and unpickling for all objects (#17980) @galipremsagar
Install duckdb the default backend for ibis in the cudf.pandas integration tests (#17972) @Matt711
Check null count too in sum aggregation (#17964) @Matt711
Raise NotImplementedError for groupby.agg if duplicate columns would be created (#17956) @mroeschke
Ensure disabling the module accelerator is thread-safe (#17955) @vyasr
Fix DataFrame/Series.rank for int and null data in mode.pandas_compatible (#17954) @mroeschke
Limit buffer size in reallocation policy in JSON reader (#17940) @shrshi
Make cudf.pandas proxy array picklable (#17929) @Matt711
Add missing standard includes (#17928) @miscco
Fix torch integration test (#17923) @Matt711
Fix to_pandas writable bug for datetime and timedelta types (#17913) @galipremsagar
Raise NotImplementedError if .merge(suffixes=) introduces duplicate labels (#17905) @mroeschke
Fix groupby scans with int and NA data in mode.pandas_compatible (#17895) @mroeschke
Patch __init__ of cudf constructors to parse through cudf.pandas proxy objects (#17878) @galipremsagar
Fixed incorrect PTX parsing of ret instruction after branch label (#17859) @lamarrr
Relax inconsistent schema handling in dask_cudf.read_parquet (#17554) @rjzamora

📖 Documentation

Clarify that cudf.pandas should be enabled before importing pandas. (#18339) @bdice
[DOC] Add wordpiece tokenizer to cudf documentation (#18247) @davidwendt
Added pylibcudf.contiguous_split to API docs (#18194) @TomAugspurger
Fix build.sh docs for default behavior (#18180) @bdice
Update Dask-cuDF documentation to fix all warnings and errors (#18157) @TomAugspurger
[DOC] Document character normalizer (#18125) @Matt711

🚀 New Features

Add and revise experimental cudf-polars config options (#18284) @rjzamora
Support top-k and bottom_k expressions (#18222) @Matt711
Support cudf-polars is_leap_year (#18212) @brandon-b-miller
Support cudf-polars month_start/month_end (#18211) @brandon-b-miller
Support cudf-polars ordinal_day (#18152) @brandon-b-miller
Add pylibcudf.gpumemoryview support for len()/nbytes (#18133) @pentschev
Link to libzstd for ZSTD compression and decompression APIs (#18129) @shrshi
Added NDSH Q09 Benchmark for Transforms (#18127) @lamarrr
Make pylibcudf traits raise exceptions gracefully rather than terminating in C++ (#18117) @Matt711
Host decompression (#18114) @vuule
Add owning types to hold Arrow data (#18084) @vyasr
Bump polars version to <1.24 (#18076) @Matt711
Support sorted merges in cudf.polars (#18075) @Matt711
Add a slice expression to polars IR (#18050) @Matt711
Expose num_rows_per_source (IO metadata) to pylibcudf (#18049) @Matt711
Added Imbalanced Tree Benchmarks for Transforms (#18032) @lamarrr
Run the narwhals test suite with cudf.pandas (#18031) @Matt711
Add host_read_async interfaces to datasource (#18018) @vuule
Make most cudf-polars Node objects pickleable (#17998) @rjzamora
Add Column.serialize to cudf-polars (#17990) @rjzamora
Bump polars version to <1.23 (#17986) @Matt711
Implemented Decimal Transforms (#17968) @lamarrr
Introduce ZSTD host-side compression and decompression APIs (#17935) @shrshi
Add catboost integration tests (#17931) @Matt711
[FEA] Expose stripe_size_rows setting for ORCWriterOptions (#17927) @ustcfy
Test narwhals in CI (#17884) @bdice
Added Multi-input & Scalar Support for Transform UDFs (#17881) @lamarrr
Host Snappy compression (#17824) @vuule
Run spark-rapids-jni CI (#17781) @KyleFromNVIDIA
Add multi-partition Shuffle operation to cuDF Polars (#17744) @rjzamora
Added polynomials benchmark (#17695) @lamarrr
Add stream parameters in pylibcudf IO APIs (#17620) @Matt711
New nvtext::wordpiece_tokenizer APIs (#17600) @davidwendt
Add support for unary negation operator (#17560) @Matt711
Add multi-partition Join support to cuDF-Polars (#17518) @rjzamora
Add basic multi-partition GroupBy support to cuDF-Polars (#17503) @rjzamora
Support Distributed in cudf-polars tests and IR evaluation (#17364) @pentschev

🛠️ Improvements

Use pyarrow 15 in oldest dependency CI jobs (#18409) @bdice
Bump librdkafka to 2.8.0 (#18370) @raydouglass
fix(rattler): ignore libzlib run dependency to avoid pandoc collision (#18368) @gforsyth
Fix zstd build interface include definition (#18366) @trxcllnt
test: Install pytest-env and hypothesis in test_narwhals.sh (#18337) @MarcoGorelli
Remove unused group_range_rolling_window API (#18313) @wence-
Cache column view creation from arrow types (#18302) @vyasr
Split Narwhals cudf.pandas tests failures into to fix and to skip (#18267) @mroeschke
Support BinOp, min, and max Aggregations in cudf-polars parallel ...

Contributors

msarahan, trxcllnt, and 34 other contributors

Assets 2

03 Mar 18:22

raydouglass

v25.02.02

8139f3c

v25.02.02

🚨 Breaking Changes

Expose stream-ordering in scalar and avro APIs (#17766) @shrshi
Add seed parameter to hash_character_ngrams (#17643) @davidwendt
Performance improvements and simplifications for fixed size row-based rolling windows (#17623) @wence-
Refactor distinct hash join to handle multiple probes with the same build table (#17609) @PointKernel
Deprecate cudf::grouped_time_range_rolling_window (#17589) @wence-
Remove "legacy" Dask DataFrame support from Dask cuDF (#17558) @rjzamora
Return empty result for segmented_reduce if input and offsets are both empty (#17437) @davidwendt
Rework minhash APIs for deprecation cycle (#17421) @davidwendt
Change indices for dictionary column to signed integer type (#17390) @davidwendt

🐛 Bug Fixes

Use protocol for dlpack instead of deprecated function (#18134) @vyasr
Skip the failing connectorx polars tests (#18037) @Matt711
Fix 'Unexpected short subpass' exception in parquet chunked reader. (#18019) @nvdbaranec
Fix race check failures in shared memory groupby (#17985) @PointKernel
Pin ibis version in the cudf.pandas integration tests <10.0.0 (#17975) @Matt711
Fix the index type in the indexing operator of the span types (#17971) @vuule
Add missing pin (#17915) @vyasr
Fix third-party cudf.pandas tests (#17900) @galipremsagar
Fix numpy data access by making attribute private (#17890) @galipremsagar
Remove extra local var declaration from cudf.pandas 3rd-party integration shell script (#17886) @Matt711
Move isinstance_cudf_pandas to fast_slow_proxy (#17875) @galipremsagar
Make _Series_dtype method a property (#17854) @Matt711
Fix the bug in determining the heuristics for shared memory groupby (#17851) @PointKernel
Fix possible OOB mem access in Parquet decoder (#17841) @mhaseeb123
Require batches to be non-empty in multi-batch JSON reader (#17837) @shrshi
Fix rolling(min_periods=) with int and null data with mode.pandas_compat (#17822) @mroeschke
Resolve race-condition in disable_module_accelerator (#17811) @galipremsagar
Make Series(dtype=object) raise in mode.pandas_compat with non string data (#17804) @mroeschke
Disable intended disabled ORC tests (#17790) @davidwendt
Fix empty DataFrame construction not returning RangeIndex columns (#17784) @mroeschke
Fix various .str methods for pandas compatability (#17782) @mroeschke
Fix count API issue about ignoring nan values (#17779) @galipremsagar
Add numba pinning to cudf repo (#17777) @galipremsagar
Allow .sort_values(na_position=) to include NaNs in mode.pandas_compatible (#17776) @mroeschke
allow deselecting nvcomp wheels (#17774) @jameslamb
Use the aligned_resource_adaptor to allocate bloom filter device buffers (#17758) @mhaseeb123
Avoid instantiating bloom filter query function for nested and bool types (#17753) @mhaseeb123
Fix DataFrame.merge(Series, how="left"/"right") on column and index not resulting in a RangeIndex (#17739) @mroeschke
[BUG] xfail Polars excel test (#17731) @Matt711
Require to implement AutoCloseable for the classes derived from HostUDFWrapper (#17727) @ttnghia
Remove jlowe as a java committer since he retired (#17725) @tgravescs
Prevent use of invalid grid sizes in ORC reader and writer (#17709) @vuule
Enforce schema for partial tables in multi-source multi-batch JSON reader (#17708) @shrshi
Compute and use the initial string offset when building nested large string cols with chunked parquet reader (#17702) @mhaseeb123
Fix writing of compressed ORC files with large stripe footers (#17700) @vuule
Fix cudf.polars sum of empty not equalling zero (#17685) @mroeschke
Fix formatting in logging (#17680) @vuule
convert all nulls to nans in a specific scenario (#17677) @galipremsagar
Define cudf repr methods on the Column (#17675) @mroeschke
Fix groupby.len with null values in cudf.polars (#17671) @mroeschke
Fix: DataFrameGroupBy.get_group was raising with length>1 tuples (#17653) @MarcoGorelli
Fix possible int overflow in compute_mixed_join_output_size (#17633) @davidwendt
Fix a minor potential i32 overflow in thrust::transform_exclusive_scan in PQ reader preprocessing (#17617) @mhaseeb123
Fix failing xgboost test in the cudf.pandas third-party integration tests (#17616) @Matt711
Fix dask_cudf.read_csv (#17612) @rjzamora
Fix memcheck error in ReplaceTest.NormalizeNansAndZerosMutable gtest (#17610) @davidwendt
Correctly accept a pandas.CategoricalDtype(pandas.IntervalDtype(...), ...) type (#17604) @mroeschke
Add ability to modify and propagate names of columns object (#17597) @galipremsagar
Ignore NaN correctly in .quantile (#17593) @mroeschke
Fix groupby argmin/max gather of sorted-order indices (#17591) @davidwendt
Fix ctest fail running libcudf tests in a Debug build (#17576) @davidwendt
Specify a version for rapids_logger dependency (#17573) @jlowe
Fix the ORC decoding bug for the timestamp data (#17570) @kingcrimsontianyu
[JNI] remove rmm argument to set rw access for fabric handles (#17553) @abellina
Document undefined behavior in div_rounding_up_safe (#17542) @davidwendt
Fix nvcc-imposed UB in constexpr functions (#17534) @vuule
Add anonymous namespace to libcudf test source (#17529) @davidwendt
Propagate failures in pandas integration tests and Skip failing tests (#17521) @Matt711
Fix libcudf compile error when logging is disabled (#17512) @davidwendt
Fix Dask-cuDF clip APIs (#17509) @rjzamora
Fix pylibcudf to_arrow with multiple nested data types (#17504) @mroeschke
Fix groupby(as_index=False).size not reseting index (#17499) @mroeschke
Revert "Temporarily skip tests due to dask/distributed#8953" (#17492) @Matt711
Workaround for a misaligned access in read_csv on some CUDA versions (#17477) @vuule
Fix some possible thread-id overflow calculations (#17473) @davidwendt
Temporarily skip tests due to dask/distributed#8953 (#17472) @wence-
Detect mismatches in begin and end tokens returned by JSON tokenizer FST (#17471) @shrshi
Support dask>=2024.11.2 in Dask cuDF (#17439) @rjzamora
Fix write_json failure for zero columns in table/struct (#17414) @karthikeyann
Fix Debug-mode failing Arrow test (#17405) @zeroshade
Fix all null list column with missing child column in JSON reader (#17348) @karthikeyann

📖 Documentation

Fix forward merge 24.12->25.02 (#18002) @raydouglass
Fix incorrect example in pylibcudf docs (#17912) @Matt711
Explicitly call out that the GPU open beta runs on a single GPU (#17872) @taureandyernv
Update cudf.pandas colab link in docs (#17846) @taureandyernv
[DOC] Make pylibcudf docs more visible (#17803) @Matt711
Cross-link cudf.pandas profiler documentation. (#17668) @bdice
Document interpreter install command for cudf.pandas (#17358) @bdice
add comment to Series.tolist method (#17350) @tequilayu

🚀 New Features

Bump polars version to <1.22 (#17771) @Matt711
Make more constexpr available on device for cuIO (#17746) @PointKernel
Add public interop functions between pylibcudf and cudf classic (#17730) @Matt711
Support dask_expr migration into dask.dataframe (#17704) @rjzamora
Make tests build without relaxed constexpr (#17691) @PointKernel
Set default logger level to warn (#17684) @vyasr
Support multithreaded reading of compressed buffers in JSON reader (#17670) @shrshi
Control pinned memory use with environment variables (#17657) @vuule
Host compression (#17656) @vuule
Enable text build without relying on relaxed constexpr (#17647) @PointKernel
Implement HOST_UDF aggregation for reduction and segmented reduction (#17645) @ttnghia
Add JSON reader options structs to pylibcudf (#17614) @Matt711
Refactor distinct hash join to handle multiple probes with the same build table (#17609) @PointKernel
Add JSON Writer options classes to pylibcudf (#17606) @Matt711
Add ORC reader options structs to pylibcudf (#17601) @Matt711
Add Avro Reader options classes to pylibcudf (#17599) @Matt711
Enable binaryop build without relying on relaxed constexpr (#17598) @PointKernel
Measure the number of Parquet row groups filtered by predicate pushdown (#17594) @mhaseeb123
Implement HOST_UDF aggregation for groupby (#17592) @ttnghia
Plumb pylibcudf.io.parquet options classes through cudf python (#17506) @Matt711
Add partition-wise Select support to cuDF-Polars (#17495) @rjzamora
Add multi-partition Scan support to cuDF-Polars (#17494) @rjzamora
Migrate cudf::io::merge_row_group_metadata to pylibcudf (#17491) @Matt711
Add Parquet Reader options classes to pylibcudf (#17464) @Matt711
Add multi-partition DataFrameScan support to cuDF-Polars (#17441) @rjzamora
Return empty result for segmented_reduce if input and offsets are both empty (#17437) @davidwendt
Abstract polars function expression nodes to ensure they are serializable (#17418) @pentschev
Add CSV Reader options classes to pylibcudf (#17412) @Matt711
Add support for pylibcudf.DataType serialization (#17352) @pentschev
Enable rounding for Decimal32 and Decimal64 in cuDF (#17332) @a-hirota
Remove upper bounds on cuda-python to allow 12.6.2 and 11.8.5 (#17326) @bdice
Expose stream-ordering to groupby APIs (#17324) @shrshi
Migrate ORC Writer to pylibcudf (#17310) @Matt711
Support reading bloom filters from Parquet files and filter row groups using them (#17289) @mhaseeb123

🛠️ Improvements

Update to nvcomp 4.2.0.11 (#18042) @bdice
Remove pandas backend from cudf.pandas - ibis integration tests (#17945) @Matt711
Revert CUDA 12.8 shared workflow branch changes (#17879) @vyasr
Remove predicate param from DataFrameScan IR (#17852) @Matt711
Remove cudf.Scalar from scatter APIs (#17847) @mroeschke
Remove cudf.Scalar from interval_range (#17844) @mroeschke
Add verify-codeowners hook (#17840) @KyleFromNVIDIA
Build and test with CUDA 12.8.0 (#17834) @bdice
In...

Contributors

msarahan, zeroshade, and 37 other contributors

Assets 2

27 Feb 16:38

AyodeAwe

v25.02.01

b1efe69

v25.02.01

🚨 Breaking Changes

Expose stream-ordering in scalar and avro APIs (#17766) @shrshi
Add seed parameter to hash_character_ngrams (#17643) @davidwendt
Performance improvements and simplifications for fixed size row-based rolling windows (#17623) @wence-
Refactor distinct hash join to handle multiple probes with the same build table (#17609) @PointKernel
Deprecate cudf::grouped_time_range_rolling_window (#17589) @wence-
Remove "legacy" Dask DataFrame support from Dask cuDF (#17558) @rjzamora
Return empty result for segmented_reduce if input and offsets are both empty (#17437) @davidwendt
Rework minhash APIs for deprecation cycle (#17421) @davidwendt
Change indices for dictionary column to signed integer type (#17390) @davidwendt

🐛 Bug Fixes

Skip the failing connectorx polars tests (#18037) @Matt711
Fix 'Unexpected short subpass' exception in parquet chunked reader. (#18019) @nvdbaranec
Fix race check failures in shared memory groupby (#17985) @PointKernel
Pin ibis version in the cudf.pandas integration tests <10.0.0 (#17975) @Matt711
Fix the index type in the indexing operator of the span types (#17971) @vuule
Add missing pin (#17915) @vyasr
Fix third-party cudf.pandas tests (#17900) @galipremsagar
Fix numpy data access by making attribute private (#17890) @galipremsagar
Remove extra local var declaration from cudf.pandas 3rd-party integration shell script (#17886) @Matt711
Move isinstance_cudf_pandas to fast_slow_proxy (#17875) @galipremsagar
Make _Series_dtype method a property (#17854) @Matt711
Fix the bug in determining the heuristics for shared memory groupby (#17851) @PointKernel
Fix possible OOB mem access in Parquet decoder (#17841) @mhaseeb123
Require batches to be non-empty in multi-batch JSON reader (#17837) @shrshi
Fix rolling(min_periods=) with int and null data with mode.pandas_compat (#17822) @mroeschke
Resolve race-condition in disable_module_accelerator (#17811) @galipremsagar
Make Series(dtype=object) raise in mode.pandas_compat with non string data (#17804) @mroeschke
Disable intended disabled ORC tests (#17790) @davidwendt
Fix empty DataFrame construction not returning RangeIndex columns (#17784) @mroeschke
Fix various .str methods for pandas compatability (#17782) @mroeschke
Fix count API issue about ignoring nan values (#17779) @galipremsagar
Add numba pinning to cudf repo (#17777) @galipremsagar
Allow .sort_values(na_position=) to include NaNs in mode.pandas_compatible (#17776) @mroeschke
allow deselecting nvcomp wheels (#17774) @jameslamb
Use the aligned_resource_adaptor to allocate bloom filter device buffers (#17758) @mhaseeb123
Avoid instantiating bloom filter query function for nested and bool types (#17753) @mhaseeb123
Fix DataFrame.merge(Series, how="left"/"right") on column and index not resulting in a RangeIndex (#17739) @mroeschke
[BUG] xfail Polars excel test (#17731) @Matt711
Require to implement AutoCloseable for the classes derived from HostUDFWrapper (#17727) @ttnghia
Remove jlowe as a java committer since he retired (#17725) @tgravescs
Prevent use of invalid grid sizes in ORC reader and writer (#17709) @vuule
Enforce schema for partial tables in multi-source multi-batch JSON reader (#17708) @shrshi
Compute and use the initial string offset when building nested large string cols with chunked parquet reader (#17702) @mhaseeb123
Fix writing of compressed ORC files with large stripe footers (#17700) @vuule
Fix cudf.polars sum of empty not equalling zero (#17685) @mroeschke
Fix formatting in logging (#17680) @vuule
convert all nulls to nans in a specific scenario (#17677) @galipremsagar
Define cudf repr methods on the Column (#17675) @mroeschke
Fix groupby.len with null values in cudf.polars (#17671) @mroeschke
Fix: DataFrameGroupBy.get_group was raising with length>1 tuples (#17653) @MarcoGorelli
Fix possible int overflow in compute_mixed_join_output_size (#17633) @davidwendt
Fix a minor potential i32 overflow in thrust::transform_exclusive_scan in PQ reader preprocessing (#17617) @mhaseeb123
Fix failing xgboost test in the cudf.pandas third-party integration tests (#17616) @Matt711
Fix dask_cudf.read_csv (#17612) @rjzamora
Fix memcheck error in ReplaceTest.NormalizeNansAndZerosMutable gtest (#17610) @davidwendt
Correctly accept a pandas.CategoricalDtype(pandas.IntervalDtype(...), ...) type (#17604) @mroeschke
Add ability to modify and propagate names of columns object (#17597) @galipremsagar
Ignore NaN correctly in .quantile (#17593) @mroeschke
Fix groupby argmin/max gather of sorted-order indices (#17591) @davidwendt
Fix ctest fail running libcudf tests in a Debug build (#17576) @davidwendt
Specify a version for rapids_logger dependency (#17573) @jlowe
Fix the ORC decoding bug for the timestamp data (#17570) @kingcrimsontianyu
[JNI] remove rmm argument to set rw access for fabric handles (#17553) @abellina
Document undefined behavior in div_rounding_up_safe (#17542) @davidwendt
Fix nvcc-imposed UB in constexpr functions (#17534) @vuule
Add anonymous namespace to libcudf test source (#17529) @davidwendt
Propagate failures in pandas integration tests and Skip failing tests (#17521) @Matt711
Fix libcudf compile error when logging is disabled (#17512) @davidwendt
Fix Dask-cuDF clip APIs (#17509) @rjzamora
Fix pylibcudf to_arrow with multiple nested data types (#17504) @mroeschke
Fix groupby(as_index=False).size not reseting index (#17499) @mroeschke
Revert "Temporarily skip tests due to dask/distributed#8953" (#17492) @Matt711
Workaround for a misaligned access in read_csv on some CUDA versions (#17477) @vuule
Fix some possible thread-id overflow calculations (#17473) @davidwendt
Temporarily skip tests due to dask/distributed#8953 (#17472) @wence-
Detect mismatches in begin and end tokens returned by JSON tokenizer FST (#17471) @shrshi
Support dask>=2024.11.2 in Dask cuDF (#17439) @rjzamora
Fix write_json failure for zero columns in table/struct (#17414) @karthikeyann
Fix Debug-mode failing Arrow test (#17405) @zeroshade
Fix all null list column with missing child column in JSON reader (#17348) @karthikeyann

📖 Documentation

Fix forward merge 24.12->25.02 (#18002) @raydouglass
Fix incorrect example in pylibcudf docs (#17912) @Matt711
Explicitly call out that the GPU open beta runs on a single GPU (#17872) @taureandyernv
Update cudf.pandas colab link in docs (#17846) @taureandyernv
[DOC] Make pylibcudf docs more visible (#17803) @Matt711
Cross-link cudf.pandas profiler documentation. (#17668) @bdice
Document interpreter install command for cudf.pandas (#17358) @bdice
add comment to Series.tolist method (#17350) @tequilayu

🚀 New Features

Bump polars version to <1.22 (#17771) @Matt711
Make more constexpr available on device for cuIO (#17746) @PointKernel
Add public interop functions between pylibcudf and cudf classic (#17730) @Matt711
Support dask_expr migration into dask.dataframe (#17704) @rjzamora
Make tests build without relaxed constexpr (#17691) @PointKernel
Set default logger level to warn (#17684) @vyasr
Support multithreaded reading of compressed buffers in JSON reader (#17670) @shrshi
Control pinned memory use with environment variables (#17657) @vuule
Host compression (#17656) @vuule
Enable text build without relying on relaxed constexpr (#17647) @PointKernel
Implement HOST_UDF aggregation for reduction and segmented reduction (#17645) @ttnghia
Add JSON reader options structs to pylibcudf (#17614) @Matt711
Refactor distinct hash join to handle multiple probes with the same build table (#17609) @PointKernel
Add JSON Writer options classes to pylibcudf (#17606) @Matt711
Add ORC reader options structs to pylibcudf (#17601) @Matt711
Add Avro Reader options classes to pylibcudf (#17599) @Matt711
Enable binaryop build without relying on relaxed constexpr (#17598) @PointKernel
Measure the number of Parquet row groups filtered by predicate pushdown (#17594) @mhaseeb123
Implement HOST_UDF aggregation for groupby (#17592) @ttnghia
Plumb pylibcudf.io.parquet options classes through cudf python (#17506) @Matt711
Add partition-wise Select support to cuDF-Polars (#17495) @rjzamora
Add multi-partition Scan support to cuDF-Polars (#17494) @rjzamora
Migrate cudf::io::merge_row_group_metadata to pylibcudf (#17491) @Matt711
Add Parquet Reader options classes to pylibcudf (#17464) @Matt711
Add multi-partition DataFrameScan support to cuDF-Polars (#17441) @rjzamora
Return empty result for segmented_reduce if input and offsets are both empty (#17437) @davidwendt
Abstract polars function expression nodes to ensure they are serializable (#17418) @pentschev
Add CSV Reader options classes to pylibcudf (#17412) @Matt711
Add support for pylibcudf.DataType serialization (#17352) @pentschev
Enable rounding for Decimal32 and Decimal64 in cuDF (#17332) @a-hirota
Remove upper bounds on cuda-python to allow 12.6.2 and 11.8.5 (#17326) @bdice
Expose stream-ordering to groupby APIs (#17324) @shrshi
Migrate ORC Writer to pylibcudf (#17310) @Matt711
Support reading bloom filters from Parquet files and filter row groups using them (#17289) @mhaseeb123

🛠️ Improvements

Update to nvcomp 4.2.0.11 (#18042) @bdice
Remove pandas backend from cudf.pandas - ibis integration tests (#17945) @Matt711
Revert CUDA 12.8 shared workflow branch changes (#17879) @vyasr
Remove predicate param from DataFrameScan IR (#17852) @Matt711
Remove cudf.Scalar from scatter APIs (#17847) @mroeschke
Remove cudf.Scalar from interval_range (#17844) @mroeschke
Add verify-codeowners hook (#17840) @KyleFromNVIDIA
Build and test with CUDA 12.8.0 (#17834) @bdice
Increase timeout for recently added test (#17829) @galipremsagar
Apply ru...

Contributors

msarahan, zeroshade, and 37 other contributors

Assets 2

11 Dec 19:11

GPUtester

v24.12.00

ff41ecf

v24.12.00

🚨 Breaking Changes

Fix reading Parquet string cols when nrows and input_pass_limit > 0 (#17321) @mhaseeb123
prefer wheel-provided libcudf.so in load_library(), use RTLD_LOCAL (#17316) @jameslamb
Deprecate single component extraction methods in libcudf (#17221) @Matt711
Move detail header floating_conversion.hpp to detail subdirectory (#17209) @davidwendt
Refactor Dask cuDF legacy code (#17205) @rjzamora
Make HostMemoryBuffer call into the DefaultHostMemoryAllocator (#17204) @revans2
Remove java reservation (#17189) @revans2
Separate evaluation logic from IR objects in cudf-polars (#17175) @rjzamora
Upgrade to polars 1.11 in cudf-polars (#17154) @wence-
Remove the additional host register calls initially intended for performance improvement on Grace Hopper (#17092) @kingcrimsontianyu
Correctly set is_device_accesible when creating host_spans from other container/span types (#17079) @vuule
Unify treatment of Expr and IR nodes in cudf-polars DSL (#17016) @wence-
Deprecate support for directly accessing logger (#16964) @vyasr
Made cudftestutil header-only and removed GTest dependency (#16839) @lamarrr

🐛 Bug Fixes

Turn off cudf.pandas 3rd party integrations tests for 24.12 (#17500) @Matt711
Ignore errors when testing glibc versions (#17389) @vyasr
Adapt to KvikIO API change in the compatibility mode (#17377) @kingcrimsontianyu
Support pivot with index or column arguments as lists (#17373) @mroeschke
Deselect failing polars tests (#17362) @pentschev
Fix integer overflow in compiled binaryop (#17354) @wence-
Update cmake to 3.28.6 in JNI Dockerfile (#17342) @jlowe
fix library-loading issues in editable installs (#17338) @jameslamb
Bug fix: restrict lines=True to JSON format in Kafka read_gdf method (#17333) @a-hirota
Fix various issues with replace API and add support in datetime and timedelta columns (#17331) @galipremsagar
Do not exclude nanoarrow and flatbuffers from installation if statically linked (#17322) @hyperbolic2346
Fix reading Parquet string cols when nrows and input_pass_limit > 0 (#17321) @mhaseeb123
Remove another reference to FindcuFile (#17315) @KyleFromNVIDIA
Fix reading of single-row unterminated CSV files (#17305) @vuule
Fixed lifetime issue in ast transform tests (#17292) @lamarrr
Switch to using TaskSpec (#17285) @galipremsagar
Fix data_type ctor call in JSON_TEST (#17273) @davidwendt
Expose delimiter character in JSON reader options to JSON reader APIs (#17266) @shrshi
Fix extract-datetime deprecation warning in ndsh benchmark (#17254) @davidwendt
Disallow cuda-python 12.6.1 and 11.8.4 (#17253) @bdice
Wrap custom iterator result (#17251) @galipremsagar
Fix binop with LHS numpy datetimelike scalar (#17226) @mroeschke
Fix Dataframe.__setitem__ slow-downs (#17222) @galipremsagar
Fix groupby.get_group with length-1 tuple with list-like grouper (#17216) @mroeschke
Fix discoverability of submodules inside pd.util (#17215) @galipremsagar
Fix Schema.Builder does not propagate precision value to Builder instance (#17214) @ttnghia
Mark column chunks in a PQ reader pass as large strings when the cumulative offsets exceeds the large strings threshold. (#17207) @mhaseeb123
[BUG] Replace repo_token with github_token in Auto Assign PR GHA (#17203) @Matt711
Remove unsanitized nulls from input strings columns in reduction gtests (#17202) @davidwendt
Fix to_parquet append behavior with global metadata file (#17198) @rjzamora
Check num_children() == 0 in Column.from_column_view (#17193) @cwharris
Fix host-to-device copy missing sync in strings/duration convert (#17149) @davidwendt
Add JNI Support for Multi-line Delimiters and Include Test (#17139) @SurajAralihalli
Ignore loud dask warnings about legacy dataframe implementation (#17137) @galipremsagar
Fix the GDS read/write segfault/bus error when the cuFile policy is set to GDS or ALWAYS (#17122) @kingcrimsontianyu
Fix DataFrame._from_arrays and introduce validations (#17112) @galipremsagar
[Bug] Fix Arrow-FS parquet reader for larger files (#17099) @rjzamora
Fix bug in recovering invalid lines in JSONL inputs (#17098) @shrshi
Reenable huge pages for arrow host copying (#17097) @vyasr
Correctly set is_device_accesible when creating host_spans from other container/span types (#17079) @vuule
Fix ORC reader when using device_read_async while the destination device buffers are not ready (#17074) @ttnghia
Fix regex handling of fixed quantifier with 0 range (#17067) @davidwendt
Limit the number of keys to calculate column sizes and page starts in PQ reader to 1B (#17059) @mhaseeb123
Adding assertion to check for regular JSON inputs of size greater than INT_MAX bytes (#17057) @shrshi
bug fix: use self.ck_consumer in poll method of kafka.py to align with __init__ (#17044) @a-hirota
Disable kvikio remote I/O to avoid openssl dependencies in JNI build (#17026) @pxLi
Fix host_span constructor to correctly copy is_device_accessible (#17020) @vuule
Add pinning for pyarrow in wheels (#17018) @vyasr
Use std::optional for host types (#17015) @robertmaynard
Fix write_json to handle empty string column (#16995) @karthikeyann
Restore export of nvcomp outside of wheel builds (#16988) @KyleFromNVIDIA
Allow melt(var_name=) to be a falsy label (#16981) @mroeschke
Fix astype from tz-aware type to tz-aware type (#16980) @mroeschke
Use libcudf wheel from PR rather than nightly for polars-polars CI test job (#16975) @brandon-b-miller
Fix order-preservation in pandas-compat unsorted groupby (#16942) @wence-
Fix cudf::strings::findall error with empty input (#16928) @davidwendt
Fix JsonLargeReaderTest.MultiBatch use of LIBCUDF_JSON_BATCH_SIZE env var (#16927) @davidwendt
Parse newline as whitespace character while tokenizing JSONL inputs with non-newline delimiter (#16923) @shrshi
Respect groupby.nunique(dropna=False) (#16921) @mroeschke
Update all rmm imports to use pylibrmm/librmm (#16913) @Matt711
Fix order-preservation in cudf-polars groupby (#16907) @wence-
Add a shortcut for when the input clusters are all empty for the tdigest merge (#16897) @jihoonson
Properly handle the mapped and registered regions in memory_mapped_source (#16865) @vuule
Fix performance regression for generate_character_ngrams (#16849) @davidwendt
Fix regex parsing logic handling of nested quantifiers (#16798) @davidwendt
Compute whole column variance using numerically stable approach (#16448) @wence-

📖 Documentation

Add documentation for low memory readers (#17314) @btepera
Fix the example in documentation for get_dremel_data() (#17242) @mhaseeb123
Fix some documentation rendering for pylibcudf (#17217) @mroeschke
Move detail header floating_conversion.hpp to detail subdirectory (#17209) @davidwendt
Add TokenizeVocabulary to api docs (#17208) @davidwendt
Add jaccard_index to generated cuDF docs (#17199) @davidwendt
[no ci] Add empty-columns section to the libcudf developer guide (#17183) @davidwendt
Add 2-cpp approvers text to contributing guide [no ci] (#17182) @davidwendt
Changing developer guide int_64_t to int64_t (#17130) @hyperbolic2346
docs: change 'CSV' to 'csv' in python/custreamz/README.md to match kafka.py (#17041) @a-hirota
[DOC] Document limitation using cudf.pandas proxy arrays (#16955) @Matt711
[DOC] Document environment variable for failing on fallback in cudf.pandas (#16932) @Matt711

🚀 New Features

Add version config (#17312) @vyasr
Java JNI for Multiple contains (#17281) @res-life
Add cudf::calendrical_month_sequence to pylibcudf (#17277) @Matt711
Raise errors on specific types of fallback in cudf.pandas (#17268) @Matt711
Add catboost to the third-party integration tests (#17267) @Matt711
Add type stubs for pylibcudf (#17258) @wence-
Use pylibcudf contiguous split APIs in cudf python (#17246) @Matt711
Upgrade nvcomp to 4.1.0.6 (#17201) @bdice
Added Arrow Interop Benchmarks (#17194) @lamarrr
Rewrite Java API Table.readJSON to return the output from libcudf read_json directly (#17180) @ttnghia
Support storing precision of decimal types in Schema class (#17176) @ttnghia
Migrate CSV writer to pylibcudf (#17163) @Matt711
Add compute_shared_memory_aggs used by shared memory groupby (#17162) @PointKernel
Added ast tree to simplify expression lifetime management (#17156) @lamarrr
Add compute_mapping_indices used by shared memory groupby (#17147) @PointKernel
Add remaining datetime APIs to pylibcudf (#17143) @Matt711
Added strings AST vs BINARY_OP benchmarks (#17128) @lamarrr
Use libcudf_exception_handler throughout pylibcudf.libcudf (#17109) @brandon-b-miller
Include timezone file path in error message (#17102) @bdice
Migrate NVText Byte Pair Encoding APIs to pylibcudf (#17101) @Matt711
Migrate NVText Tokenizing APIs to pylibcudf (#17100) @Matt711
Migrate NVtext subword tokenizing APIs to pylibcudf (#17096) @Matt711
Migrate NVText Stemming APIs to pylibcudf (#17085) @Matt711
Migrate NVText Replacing APIs to pylibcudf (#17084) @Matt711
Add IWYU to CI (#17078) @vyasr
cudf-polars string/numeric casting (#17076) @brandon-b-miller
Migrate NVText Normalizing APIs to Pylibcudf (#17072) @Matt711
Migrate remaining nvtext NGrams APIs to pylibcudf (#17070) @Matt711
Add profilers to CUDA 12 conda devcontainers (#17066) @vyasr
Add conda recipe for cudf-polars (#17037) @bdice
Implement batch construction for strings columns (#17035) @ttnghia
Add device aggregators used by shared memory groupby (#17031) @PointKernel
Add optional column_order in JSON reader (#17029) @karthikeyann
Migrate Min Hashing APIs to pylibcudf (#17021) @Matt711
Reorganize cudf_polars expression code (#17014) @brandon-b-miller
Migrate nvtext jaccard API to pylibcudf (#17007) @Matt711
Migrate nvtext generate_ngrams APIs to pylibcudf (#17006) @ma...

Contributors

msarahan, robertmaynard, and 38 other contributors

Assets 2

29 Oct 18:24

raydouglass

v24.10.01

7b0adfa

v24.10.01

This hotfix corrected some python packaging issues.

Full Changelog: v24.10.00...v24.10.01

Assets 2

09 Oct 15:25

raydouglass

v24.10.00

67193a8

v24.10.00

🚨 Breaking Changes

Whitespace normalization of nested column coerced as string column in JSONL inputs (#16759) @shrshi
Add libcudf wrappers around current_device_resource functions. (#16679) @harrism
Fix empty cluster handling in tdigest merge (#16675) @jihoonson
Remove java ColumnView.copyWithBooleanColumnAsValidity (#16660) @revans2
Support reading multiple PQ sources with mismatching nullability for columns (#16639) @mhaseeb123
Remove arrow_io_source (#16607) @vyasr
Remove legacy Arrow interop APIs (#16590) @vyasr
Remove NativeFile support from cudf Python (#16589) @vyasr
Revert "Make proxy NumPy arrays pass isinstance check in cudf.pandas" (#16586) @Matt711
Align public utility function signatures with pandas 2.x (#16565) @mroeschke
Disallow cudf.Index accepting column in favor of ._from_column (#16549) @mroeschke
Refactor dictionary encoding in PQ writer to migrate to the new cuco::static_map (#16541) @mhaseeb123
Change IPv4 convert APIs to support UINT32 instead of INT64 (#16489) @davidwendt
enable list to be forced as string in JSON reader. (#16472) @karthikeyann
Disallow cudf.Series to accept column in favor of ._from_column (#16454) @mroeschke
Align groupby APIs with pandas 2.x (#16403) @mroeschke
Align misc DataFrame and MultiIndex methods with pandas 2.x (#16402) @mroeschke
Align Index APIs with pandas 2.x (#16361) @mroeschke
Add stream param to stream compaction APIs (#16295) @JayjeetAtGithub

🐛 Bug Fixes

Add license to the pylibcudf wheel (#16976) @raydouglass
Parse newline as whitespace character while tokenizing JSONL inputs with non-newline delimiter (#16950) @shrshi
Add dask-cudf workaround for missing rename_axis support in cudf (#16899) @rjzamora
Update oldest deps for pyarrow & numpy (#16883) @galipremsagar
Update labeler for pylibcudf (#16868) @vyasr
Revert "Refactor mixed_semi_join using cuco::static_set" (#16855) @mhaseeb123
Fix metadata after implicit array conversion from Dask cuDF (#16842) @rjzamora
Add cudf.pandas dependencies.yaml to update-version.sh (#16840) @raydouglass
Use cupy 12.2.0 as oldest dependency pinning on CUDA 12 ARM (#16808) @bdice
Revert "Fix empty cluster handling in tdigest merge (#16675)" (#16800) @jihoonson
Intentionally leak thread_local CUDA resources to avoid crash (part 1) (#16787) @kingcrimsontianyu
Fix cov/corr bug in dask-cudf (#16786) @rjzamora
Fix slice_strings wide strings logic with multi-byte characters (#16777) @davidwendt
Fix nvbench output for sha512 (#16773) @davidwendt
Allow read_csv(header=None) to return int column labels in mode.pandas_compatible (#16769) @mroeschke
Whitespace normalization of nested column coerced as string column in JSONL inputs (#16759) @shrshi
Fix DataFrame.drop(columns=cudf.Series/Index, axis=1) (#16712) @mroeschke
Use merge base when calculating changed files (#16709) @KyleFromNVIDIA
Ensure we pass the has_nulls tparam to mixed_join kernels (#16708) @abellina
Add boost-devel to Java CI Docker image (#16707) @jlowe
[BUG] Add gpu node type to cudf-pandas 3rd-party integration nightly CI job (#16704) @Matt711
Fix typo in column_factories.hpp comment from 'depth 1' to 'depth 2' (#16700) @a-hirota
Fix Series.to_frame(name=None) setting a None name (#16698) @mroeschke
Disable gtests/ERROR_TEST during compute-sanitizer memcheck test (#16691) @davidwendt
Enable batched multi-source reading of JSONL files with large records (#16687) @shrshi
Handle ordered parameter in CategoricalIndex.__repr__ (#16683) @galipremsagar
Fix loc/iloc.setitem[:, loc] with non cupy types (#16677) @mroeschke
Fix empty cluster handling in tdigest merge (#16675) @jihoonson
Fix cudf::rank not getting enough params (#16666) @JayjeetAtGithub
Fix slowdown in CategoricalIndex.__repr__ (#16665) @galipremsagar
Remove java ColumnView.copyWithBooleanColumnAsValidity (#16660) @revans2
Fix slowdown in DataFrame repr in jupyter notebook (#16656) @galipremsagar
Preserve Series name in duplicated method. (#16655) @bdice
Fix interval_range right child non-zero offset (#16651) @mroeschke
fix libcudf wheel publishing, make package-type explicit in wheel publishing (#16650) @jameslamb
Revert "Hide all gtest symbols in cudftestutil (#16546)" (#16644) @robertmaynard
Fix integer overflow in indexalator pointer logic (#16643) @davidwendt
Allow for binops between two differently sized DecimalDtypes (#16638) @mroeschke
Move pragma once in rolling/jit/operation.hpp. (#16636) @bdice
Fix overflow bug in low-memory JSON reader (#16632) @shrshi
Add the missing num_aggregations axis for groupby_max_cardinality (#16630) @PointKernel
Fix strings::detail::copy_range when target contains nulls (#16626) @davidwendt
Fix function parameters with common dependency modified during their evaluation (#16620) @ttnghia
bug-fix: Don't enable the CUDA language if testing was requested when finding cudf (#16615) @cryos
bug-fix: cudf/io/json.hpp use after move (#16609) @NicolasDenoyelle
Remove CUDA whole compilation ODR violations (#16603) @robertmaynard
MAINT: Adapt to numpy hiding flagsobject away (#16593) @seberg
Revert "Make proxy NumPy arrays pass isinstance check in cudf.pandas" (#16586) @Matt711
Switch python version to 3.10 in cudf.pandas pandas test scripts (#16559) @galipremsagar
Hide all gtest symbols in cudftestutil (#16546) @robertmaynard
Update the java code to properly deal with lists being returned as strings (#16536) @revans2
Register read_parquet and read_csv with dask-expr (#16535) @rjzamora
Change cudf::empty_like to not include offsets for empty strings columns (#16529) @davidwendt
Fix DataFrame reductions with median returning scalar instead of Series (#16527) @mroeschke
Allow DataFrame.sort_values(by=) to select an index level (#16519) @mroeschke
Fix date_range(start, end, freq) when end-start is divisible by freq (#16516) @mroeschke
Preserve array name in MultiIndex.from_arrays (#16515) @mroeschke
Disallow indexing by selecting duplicate labels (#16514) @mroeschke
Fix .replace(Index, Index) raising a TypeError (#16513) @mroeschke
Check index bounds in compact protocol reader. (#16493) @bdice
Fix build failures with GCC 13 (#16488) @PointKernel
Fix all-empty input column for strings split APIs (#16466) @davidwendt
Fix segmented-sort overlapped input/output indices (#16463) @davidwendt
Fix merge conflict for auto merge 16447 (#16449) @davidwendt

📖 Documentation

Fix links in Dask cuDF documentation (#16929) @rjzamora
Improve aggregation documentation (#16822) @PointKernel
Add best practices page to Dask cuDF docs (#16821) @rjzamora
[DOC] Update Pylibcudf doc strings (#16810) @Matt711
Recommending miniforge for conda install (#16782) @mmccarty
Add labeling pylibcudf doc pages (#16779) @mroeschke
Migrate dask-cudf README improvements to dask-cudf sphinx docs (#16765) @rjzamora
[DOC] Remove out of date section from cudf.pandas docs (#16697) @Matt711
Add performance tips to cudf.pandas FAQ. (#16693) @bdice
Update documentation for Dask cuDF (#16671) @rjzamora
Add missing pylibcudf strings docs (#16471) @brandon-b-miller
DOC: Refresh pylibcudf guide (#15856) @lithomas1

🚀 New Features

Build cudf-polars with build.sh (#16898) @brandon-b-miller
Add polars to "all" dependency list. (#16875) @bdice
nvCOMP GZIP integration (#16770) @vuule
[FEA] Add support for cudf.NamedAgg (#16744) @Matt711
Add experimental filesystem="arrow" support in dask_cudf.read_parquet (#16684) @rjzamora
Relax Arrow pin (#16681) @vyasr
Add libcudf wrappers around current_device_resource functions. (#16679) @harrism
Move NDS-H examples into benchmarks (#16663) @JayjeetAtGithub
[FEA] Add third-party library integration testing of cudf.pandas to cudf (#16645) @Matt711
Make isinstance check pass for proxy ndarrays (#16601) @Matt711
[FEA] Add an environment variable to fail on fallback in cudf.pandas (#16562) @Matt711
[FEA] Add support for cudf.unique (#16554) @Matt711
[FEA] Support named aggregations in df.groupby().agg() (#16528) @Matt711
Change IPv4 convert APIs to support UINT32 instead of INT64 (#16489) @davidwendt
enable list to be forced as string in JSON reader. (#16472) @karthikeyann
Remove cuDF dependency from pylibcudf column from_device tests (#16441) @brandon-b-miller
Enable cudf.pandas REPL and -c command support (#16428) @bdice
Setup pylibcudf package (#16299) @lithomas1
Add a libcudf/thrust-based TPC-H derived datagen (#16294) @JayjeetAtGithub
Make proxy NumPy arrays pass isinstance check in cudf.pandas (#16286) @Matt711
Add skiprows and nrows to parquet reader (#16214) @lithomas1
Upgrade to nvcomp 4.0.1 (#16076) @vuule
Migrate ORC reader to pylibcudf (#16042) @lithomas1
JSON reader validation of values (#15968) @karthikeyann
Implement exposed null mask APIs in pylibcudf (#15908) @charlesbluca
Word-based nvtext::minhash function (#15368) @davidwendt

🛠️ Improvements

Make tests deterministic (#16910) @galipremsagar
Update update-version.sh to use packaging lib (#16891) @AyodeAwe
Pin polars for 24.10 and update polars test suite xfail list (#16886) @wence-
Add in support for setting delim when parsing JSON through java (#16867) (#16880) @revans2
Remove unnecessary flag from build.sh (#16879) @vyasr
Ignore numba warning specific to ARM runners (#16872) @galipremsagar
Display deltas for cudf.pandas test summary (#16864) @galipremsagar
Switch to using native traceback (#16851) @galipremsagar
JSON tree algorithm code reorg (#16836) @karthikeyann
Add string.repeats API to pylibcudf (#16834) @mroeschke
Use CI workflow branch 'branch-24.10' again (#16832) @jameslamb
Rename the NDS-H benchmark binaries (#16831) @JayjeetAtGithub
Add string.findall APIs t...

Contributors

msarahan, cryos, and 40 other contributors

Assets 2

17 Sep 00:25

raydouglass

v24.08.03

e479454

v24.08.03

🚨 Breaking Changes

Align Index init APIs with pandas 2.x (#16362) @mroeschke
Align Series APIs with pandas 2.x (#16333) @mroeschke
Add missing stream param to dictionary factory APIs (#16319) @JayjeetAtGithub
Deprecate dtype= parameter in reduction methods (#16313) @mroeschke
Remove squeeze argument from groupby (#16312) @mroeschke
Align more DataFrame APIs with pandas (#16310) @mroeschke
Remove mr param from write_csv and write_json (#16231) @JayjeetAtGithub
Report number of rows per file read by PQ reader when no row selection and fix segfault in chunked PQ reader when skip_rows > 0 (#16195) @mhaseeb123
Refactor from_arrow_device/host to use resource_ref (#16160) @harrism
Deprecate Arrow support in I/O (#16132) @lithomas1
Return FrozenList for Index.names (#16047) @galipremsagar
Add compile option to enable large strings support (#16037) @davidwendt
Hide visibility of non public symbols (#15982) @robertmaynard
Rename strings multiple target replace API (#15898) @davidwendt
Pinned vector factory that uses the global pool (#15895) @vuule
Apply clang-tidy autofixes (#15894) @vyasr
Support arrow:schema in Parquet writer to faithfully roundtrip duration types with Arrow (#15875) @mhaseeb123
Expose stream parameter to public rolling APIs (#15865) @srinivasyadav18
Fix large strings handling in nvtext::character_tokenize (#15829) @davidwendt
Remove legacy JSON reader and concurrent_unordered_map.cuh. (#15813) @bdice

🐛 Bug Fixes

Ensure managed memory is supported in cudf.pandas. (#16552) @bdice
Add flatbuffers to libcudf build (#16446) @galipremsagar
Fix parquet_field_list read_func lambda capture invalid this pointer (#16440) @davidwendt
Enable prefetching in cudf.pandas.install() (#16439) @bdice
Enable prefetching before runpy (#16427) @galipremsagar
Support thread-safe for prefetch_config::get and prefetch_config::set (#16425) @ttnghia
Fix a pandas-2.0 missing attribute error (#16416) @galipremsagar
[Bug] Remove loud NativeFile deprecation noise for read_parquet from S3 (#16415) @rjzamora
Fix nightly memcheck error for empty STREAM_INTEROP_TEST (#16406) @davidwendt
Gate ArrowStringArrayNumpySemantics cudf.pandas proxy behind version check (#16401) @mroeschke
Don't export bs_thread_pool (#16398) @KyleFromNVIDIA
Require fixed width types for casting in cudf-polars (#16381) @brandon-b-miller
Fix docstring of DataFrame.apply (#16351) @galipremsagar
Make bool raise for more cudf objects (#16311) @mroeschke
Rename .devcontainers for CUDA 12.5 (#16293) @jakirkham
Fix split_record for all empty strings column (#16291) @davidwendt
Fix logic in to_arrow for empty list column (#16279) @wence-
[BUG] Make name attr of Index fast slow attrs (#16270) @Matt711
Add custom name setter and getter for proxy objects in cudf.pandas (#16234) @Matt711
Fall back when casting a timestamp to numeric in cudf-polars (#16232) @brandon-b-miller
Disable large string support for Java build (#16216) @jlowe
Remove CCCL patch for PR 211. (#16207) @bdice
Add single offset to an empty ListArray in cudf::to_arrow (#16201) @davidwendt
Fix memory_usage when calculating nested list column (#16193) @mroeschke
Support at/iat indexers in cudf.pandas (#16177) @mroeschke
Fix unused-return-value debug build error in from_arrow_stream_test.cpp (#16168) @davidwendt
Fix cudf::strings::replace_multiple hang on empty target (#16167) @davidwendt
Refactor from_arrow_device/host to use resource_ref (#16160) @harrism
interpolate returns new column if no values are interpolated (#16158) @mroeschke
Use provided memory resource for allocating mixed join results. (#16153) @bdice
Run DFG after verify-alpha-spec (#16151) @KyleFromNVIDIA
Use size_t to allow large conditional joins (#16127) @bdice
Allow only scale=0 fixed-point values in fixed_width_column_wrapper (#16120) @davidwendt
Fix pylibcudf Table.num_rows for 0 columns case and add interop to docs (#16108) @lithomas1
Add support for proxy np.flatiter objects (#16107) @Matt711
Ensure cudf objects can astype to any type when empty (#16106) @mroeschke
Support pd.read_pickle and pd.to_pickle in cudf.pandas (#16105) @Matt711
Fix unnecessarily strict check in parquet chunked reader for choosing split locations. (#16099) @nvdbaranec
Fix is_monotonic_* APIs to include nan's (#16085) @galipremsagar
More safely parse CUDA versions when subprocess output is contaminated (#16067) @brandon-b-miller
fast_slow_proxy: Don't import assert_eq at top-level (#16063) @wence-
Prevent bad ColumnAccessor state after .sort_index(axis=1, ignore_index=True) (#16061) @mroeschke
Fix ArrowDeviceArray interface to pass address of event (#16058) @zeroshade
Fix a size overflow bug in hash groupby (#16053) @PointKernel
Fix atomic_ref scope when multiple blocks are updating the same output (#16051) @vuule
Fix initialization error in to_arrow for empty string views (#16033) @wence-
Fix the int32 overflow when computing page fragment sizes for large string columns (#16028) @mhaseeb123
Fix the pool size alignment issue (#16024) @PointKernel
Improve multibyte-split byte-range performance (#16019) @davidwendt
Fix target counting in strings char-parallel replace (#16017) @davidwendt
Support IntervalDtype in cudf.from_pandas (#16014) @mroeschke
Fix memory size in create_byte_range_infos_consecutive (#16012) @davidwendt
Hide visibility of non public symbols (#15982) @robertmaynard
Fix Cython typo preventing proper inheritance (#15978) @vyasr
Fix convert_dtypes with convert_integer=False/convert_floating=True (#15964) @mroeschke
Fix nunique for MultiIndex, DataFrame, and all NA case with dropna=False (#15962) @mroeschke
Explicitly build for all GPU architectures (#15959) @vyasr
Preserve column type and class information in more DataFrame operations (#15949) @mroeschke
Add array_interface to cudf.pandas numpy.ndarray proxy (#15936) @mroeschke
Allow tests to be built when stream util is disabled (#15933) @robertmaynard
Fix JSON multi-source reading when total source size exceeds INT_MAX bytes (#15930) @shrshi
Fix dask_cudf.read_parquet regression for legacy timestamp data (#15929) @rjzamora
Fix offsetalator when accessing over 268 million rows (#15921) @davidwendt
Fix debug assert in rowgroup_char_counts_kernel (#15902) @davidwendt
Fix categorical conversion from chunked arrow arrays (#15886) @vyasr
Handling for NaN and inf when converting floating point to fixed point types (#15885) @ttnghia
Manual merge of Branch 24.08 from 24.06 (#15869) @galipremsagar
Avoid unnecessary Index cast in IndexedFrame.index setter (#15843) @charlesbluca
Fix large strings handling in nvtext::character_tokenize (#15829) @davidwendt
Fix multi-replace target count logic for large strings (#15807) @davidwendt
Fix JSON parsing memory corruption - Fix Mixed types nested children removal (#15798) @karthikeyann
Allow anonymous user in devcontainer name. (#15784) @bdice
Add support for additional metaclasses of proxies and use for ExcelWriter (#15399) @vyasr

📖 Documentation

Add docstring for from_dataframe (#16260) @mroeschke
Update libcudf compiler requirements in contributing doc (#16103) @davidwendt
Add libcudf public/detail API pattern to developer guide (#16086) @davidwendt
Explain line profiler and how to know which functions are GPU-accelerated. (#16079) @bdice
cudf.pandas documentation improvement (#15948) @Matt711
Reland "Fix docs for IO readers and strings_convert" (#15872)" (#15941) @lithomas1
Document how to use cudf.pandas in tandem with multiprocessing (#15940) @wence-
DOC: Add documentation for cudf.pandas in the Developer Guide (#15889) @Matt711
Improve options docs (#15888) @bdice
DOC: add linkcode to docs (#15860) @raybellwaves
DOC: use intersphinx mapping in pandas-compat ext (#15846) @raybellwaves
Fix inconsistent usage of 'results' and 'records' in read-json.md (#15766) @dagardner-nv
Update PandasCompat.py to resolve references (#15704) @raybellwaves

🚀 New Features

Creation of CI artifacts for cudf-polars wheels (#16680) @wence-
Warn on cuDF failure when POLARS_VERBOSE is true (#16308) @brandon-b-miller
Add drop_nulls in cudf-polars (#16290) @brandon-b-miller
[JNI] Add setKernelPinnedCopyThreshold and setPinnedAllocationThreshold (#16288) @abellina
Implement support for scan_ndjson in cudf-polars (#16263) @lithomas1
Publish cudf-polars nightlies (#16213) @lithomas1
Modify make_host_vector and make_device_uvector factories to optionally use pinned memory and kernel copy (#16206) @vuule
Migrate lists/set_operations to pylibcudf (#16190) @Matt711
Migrate lists/filling to pylibcudf (#16189) @Matt711
Fall back to CPU for unsupported libcudf binaryops in cudf-polars (#16188) @brandon-b-miller
Use resource_ref for upstream in stream_checking_resource_adaptor (#16187) @harrism
Migrate lists/modifying to pylibcudf (#16185) @Matt711
Migrate lists/filtering to pylibcudf (#16184) @Matt711
Migrate lists/sorting to pylibcudf (#16179) @Matt711
Add missing methods to lists/list_column_view.pxd in pylibcudf (#16175) @Matt711
Migrate pylibcudf lists gathering (#16170) @Matt711
Move kernel vis over to CUDF_HIDDEN (#16165) @robertmaynard
Add groupby_max multi-threaded benchmark (#16154) @srinivasyadav18
Promote has_nested_columns to cudf public API (#16131) @robertmaynard
Promote IO support queries to cudf API (#16125) @robertmaynard
cudf::merge public API now support passing a user stream (#16124) @robertmaynard
Add TPC-H inspired examples for Libcudf (#16088) @JayjeetAtGithub
Installed cudf header use cudf::allocate_like (#16087) @robertmaynard
cudf-polars string slicing (#16082) @brandon-b-miller
Migrate Parquet reader to pylibcudf (#16078) @lithomas1
Migrate lists/c...

Contributors

seberg, trxcllnt, and 38 other contributors

Assets 2

14 Aug 22:39

raydouglass

v24.08.02

e776742

v24.08.02

🚨 Breaking Changes

Align Index init APIs with pandas 2.x (#16362) @mroeschke
Align Series APIs with pandas 2.x (#16333) @mroeschke
Add missing stream param to dictionary factory APIs (#16319) @JayjeetAtGithub
Deprecate dtype= parameter in reduction methods (#16313) @mroeschke
Remove squeeze argument from groupby (#16312) @mroeschke
Align more DataFrame APIs with pandas (#16310) @mroeschke
Remove mr param from write_csv and write_json (#16231) @JayjeetAtGithub
Report number of rows per file read by PQ reader when no row selection and fix segfault in chunked PQ reader when skip_rows > 0 (#16195) @mhaseeb123
Refactor from_arrow_device/host to use resource_ref (#16160) @harrism
Deprecate Arrow support in I/O (#16132) @lithomas1
Return FrozenList for Index.names (#16047) @galipremsagar
Add compile option to enable large strings support (#16037) @davidwendt
Hide visibility of non public symbols (#15982) @robertmaynard
Rename strings multiple target replace API (#15898) @davidwendt
Pinned vector factory that uses the global pool (#15895) @vuule
Apply clang-tidy autofixes (#15894) @vyasr
Support arrow:schema in Parquet writer to faithfully roundtrip duration types with Arrow (#15875) @mhaseeb123
Expose stream parameter to public rolling APIs (#15865) @srinivasyadav18
Fix large strings handling in nvtext::character_tokenize (#15829) @davidwendt
Remove legacy JSON reader and concurrent_unordered_map.cuh. (#15813) @bdice

🐛 Bug Fixes

Ensure managed memory is supported in cudf.pandas. (#16552) @bdice
Add flatbuffers to libcudf build (#16446) @galipremsagar
Fix parquet_field_list read_func lambda capture invalid this pointer (#16440) @davidwendt
Enable prefetching in cudf.pandas.install() (#16439) @bdice
Enable prefetching before runpy (#16427) @galipremsagar
Support thread-safe for prefetch_config::get and prefetch_config::set (#16425) @ttnghia
Fix a pandas-2.0 missing attribute error (#16416) @galipremsagar
[Bug] Remove loud NativeFile deprecation noise for read_parquet from S3 (#16415) @rjzamora
Fix nightly memcheck error for empty STREAM_INTEROP_TEST (#16406) @davidwendt
Gate ArrowStringArrayNumpySemantics cudf.pandas proxy behind version check (#16401) @mroeschke
Don't export bs_thread_pool (#16398) @KyleFromNVIDIA
Require fixed width types for casting in cudf-polars (#16381) @brandon-b-miller
Fix docstring of DataFrame.apply (#16351) @galipremsagar
Make bool raise for more cudf objects (#16311) @mroeschke
Rename .devcontainers for CUDA 12.5 (#16293) @jakirkham
Fix split_record for all empty strings column (#16291) @davidwendt
Fix logic in to_arrow for empty list column (#16279) @wence-
[BUG] Make name attr of Index fast slow attrs (#16270) @Matt711
Add custom name setter and getter for proxy objects in cudf.pandas (#16234) @Matt711
Fall back when casting a timestamp to numeric in cudf-polars (#16232) @brandon-b-miller
Disable large string support for Java build (#16216) @jlowe
Remove CCCL patch for PR 211. (#16207) @bdice
Add single offset to an empty ListArray in cudf::to_arrow (#16201) @davidwendt
Fix memory_usage when calculating nested list column (#16193) @mroeschke
Support at/iat indexers in cudf.pandas (#16177) @mroeschke
Fix unused-return-value debug build error in from_arrow_stream_test.cpp (#16168) @davidwendt
Fix cudf::strings::replace_multiple hang on empty target (#16167) @davidwendt
Refactor from_arrow_device/host to use resource_ref (#16160) @harrism
interpolate returns new column if no values are interpolated (#16158) @mroeschke
Use provided memory resource for allocating mixed join results. (#16153) @bdice
Run DFG after verify-alpha-spec (#16151) @KyleFromNVIDIA
Use size_t to allow large conditional joins (#16127) @bdice
Allow only scale=0 fixed-point values in fixed_width_column_wrapper (#16120) @davidwendt
Fix pylibcudf Table.num_rows for 0 columns case and add interop to docs (#16108) @lithomas1
Add support for proxy np.flatiter objects (#16107) @Matt711
Ensure cudf objects can astype to any type when empty (#16106) @mroeschke
Support pd.read_pickle and pd.to_pickle in cudf.pandas (#16105) @Matt711
Fix unnecessarily strict check in parquet chunked reader for choosing split locations. (#16099) @nvdbaranec
Fix is_monotonic_* APIs to include nan's (#16085) @galipremsagar
More safely parse CUDA versions when subprocess output is contaminated (#16067) @brandon-b-miller
fast_slow_proxy: Don't import assert_eq at top-level (#16063) @wence-
Prevent bad ColumnAccessor state after .sort_index(axis=1, ignore_index=True) (#16061) @mroeschke
Fix ArrowDeviceArray interface to pass address of event (#16058) @zeroshade
Fix a size overflow bug in hash groupby (#16053) @PointKernel
Fix atomic_ref scope when multiple blocks are updating the same output (#16051) @vuule
Fix initialization error in to_arrow for empty string views (#16033) @wence-
Fix the int32 overflow when computing page fragment sizes for large string columns (#16028) @mhaseeb123
Fix the pool size alignment issue (#16024) @PointKernel
Improve multibyte-split byte-range performance (#16019) @davidwendt
Fix target counting in strings char-parallel replace (#16017) @davidwendt
Support IntervalDtype in cudf.from_pandas (#16014) @mroeschke
Fix memory size in create_byte_range_infos_consecutive (#16012) @davidwendt
Hide visibility of non public symbols (#15982) @robertmaynard
Fix Cython typo preventing proper inheritance (#15978) @vyasr
Fix convert_dtypes with convert_integer=False/convert_floating=True (#15964) @mroeschke
Fix nunique for MultiIndex, DataFrame, and all NA case with dropna=False (#15962) @mroeschke
Explicitly build for all GPU architectures (#15959) @vyasr
Preserve column type and class information in more DataFrame operations (#15949) @mroeschke
Add array_interface to cudf.pandas numpy.ndarray proxy (#15936) @mroeschke
Allow tests to be built when stream util is disabled (#15933) @robertmaynard
Fix JSON multi-source reading when total source size exceeds INT_MAX bytes (#15930) @shrshi
Fix dask_cudf.read_parquet regression for legacy timestamp data (#15929) @rjzamora
Fix offsetalator when accessing over 268 million rows (#15921) @davidwendt
Fix debug assert in rowgroup_char_counts_kernel (#15902) @davidwendt
Fix categorical conversion from chunked arrow arrays (#15886) @vyasr
Handling for NaN and inf when converting floating point to fixed point types (#15885) @ttnghia
Manual merge of Branch 24.08 from 24.06 (#15869) @galipremsagar
Avoid unnecessary Index cast in IndexedFrame.index setter (#15843) @charlesbluca
Fix large strings handling in nvtext::character_tokenize (#15829) @davidwendt
Fix multi-replace target count logic for large strings (#15807) @davidwendt
Fix JSON parsing memory corruption - Fix Mixed types nested children removal (#15798) @karthikeyann
Allow anonymous user in devcontainer name. (#15784) @bdice
Add support for additional metaclasses of proxies and use for ExcelWriter (#15399) @vyasr

📖 Documentation

Add docstring for from_dataframe (#16260) @mroeschke
Update libcudf compiler requirements in contributing doc (#16103) @davidwendt
Add libcudf public/detail API pattern to developer guide (#16086) @davidwendt
Explain line profiler and how to know which functions are GPU-accelerated. (#16079) @bdice
cudf.pandas documentation improvement (#15948) @Matt711
Reland "Fix docs for IO readers and strings_convert" (#15872)" (#15941) @lithomas1
Document how to use cudf.pandas in tandem with multiprocessing (#15940) @wence-
DOC: Add documentation for cudf.pandas in the Developer Guide (#15889) @Matt711
Improve options docs (#15888) @bdice
DOC: add linkcode to docs (#15860) @raybellwaves
DOC: use intersphinx mapping in pandas-compat ext (#15846) @raybellwaves
Fix inconsistent usage of 'results' and 'records' in read-json.md (#15766) @dagardner-nv
Update PandasCompat.py to resolve references (#15704) @raybellwaves

🚀 New Features

Warn on cuDF failure when POLARS_VERBOSE is true (#16308) @brandon-b-miller
Add drop_nulls in cudf-polars (#16290) @brandon-b-miller
[JNI] Add setKernelPinnedCopyThreshold and setPinnedAllocationThreshold (#16288) @abellina
Implement support for scan_ndjson in cudf-polars (#16263) @lithomas1
Publish cudf-polars nightlies (#16213) @lithomas1
Modify make_host_vector and make_device_uvector factories to optionally use pinned memory and kernel copy (#16206) @vuule
Migrate lists/set_operations to pylibcudf (#16190) @Matt711
Migrate lists/filling to pylibcudf (#16189) @Matt711
Fall back to CPU for unsupported libcudf binaryops in cudf-polars (#16188) @brandon-b-miller
Use resource_ref for upstream in stream_checking_resource_adaptor (#16187) @harrism
Migrate lists/modifying to pylibcudf (#16185) @Matt711
Migrate lists/filtering to pylibcudf (#16184) @Matt711
Migrate lists/sorting to pylibcudf (#16179) @Matt711
Add missing methods to lists/list_column_view.pxd in pylibcudf (#16175) @Matt711
Migrate pylibcudf lists gathering (#16170) @Matt711
Move kernel vis over to CUDF_HIDDEN (#16165) @robertmaynard
Add groupby_max multi-threaded benchmark (#16154) @srinivasyadav18
Promote has_nested_columns to cudf public API (#16131) @robertmaynard
Promote IO support queries to cudf API (#16125) @robertmaynard
cudf::merge public API now support passing a user stream (#16124) @robertmaynard
Add TPC-H inspired examples for Libcudf (#16088) @JayjeetAtGithub
Installed cudf header use cudf::allocate_like (#16087) @robertmaynard
cudf-polars string slicing (#16082) @brandon-b-miller
Migrate Parquet reader to pylibcudf (#16078) @lithomas1
Migrate lists/count_elements to pylibcudf (#16072) @Matt711
Migrate lists/extrac...

Contributors

seberg, trxcllnt, and 38 other contributors

Assets 2

Releases: rapidsai/cudf

v25.06.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

Uh oh!

[NIGHTLY] v25.08.00

🔗 Links

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

Uh oh!

v25.04.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

Uh oh!

v25.02.02

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

Uh oh!

v25.02.01

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

Uh oh!

v24.12.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

Uh oh!

v24.10.01

Uh oh!

v24.10.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

Uh oh!

v24.08.03

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

Uh oh!

v24.08.02

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

Uh oh!