🚨 Breaking Changes
- Remove cudf.BaseIndex (#18751) @mroeschke
- Implement
BIT_COUNT
unary operation (#18589) @ttnghia - Expose column chunk metadata in
read_parquet_metadata()
(#18579) @mhaseeb123 - Fix overflow for
MERGE_M2
groupby aggregation (#18546) @ttnghia - Deduplicate parquet physical type enums (#18526) @mhaseeb123
- Implemented String Output & User-data Support for Transforms (#18490) @lamarrr
- Promote Parquet type enums to enum classes (#18441) @mhaseeb123
- Move parquet schema types and structs to public headers (#18424) @mhaseeb123
- Start removal of vector factories with
_sync
suffix by deprecating them and adding versions without the suffix (#18414) @vuule - Skip decoding of pages marked as pruned in PQ reader (#18347) @mhaseeb123
- Deprecate nvtext subword tokenizer (#18334) @davidwendt
- Add standard data ingestion pipelines to pylibcudf for ndarrays (#18311) @Matt711
- Remove extranous modules from top level cudf namespace (#18287) @mroeschke
- Add Keep Option Parameter to Distinct (#18237) @warrickhe
- Update to CCCL 2.8.x with no CCCL patches (#18235) @bdice
🐛 Bug Fixes
- Disable pytest benchmark for Narwhals CI job (#19074) @Matt711
- Avoid undefined behaviour in rolling_store_output_functor (#19069) @wence-
- Filter out pkg_resources UserWarning to make nightly CI pass (#19058) @Matt711
- Pin deltalake to <1.0.0 (#19017) @Matt711
- [BUG] Incorrectly getting the caller's frame when searching for locals and globals in cudf.pandas (#18979) @Matt711
- Ensure gc fixture is used in custreamz test (#18915) @TomAugspurger
- Fix a potential segfault in PQ reader's number of rows per source calculation (#18906) @mhaseeb123
- Fix Dataframe
getitem
whenMultiIndex
columns exist (#18880) @galipremsagar - Ensure eq/ne between Columns in public objects don't return bool (#18875) @mroeschke
- Fix fencepost error in
Repartition
task generation (#18854) @wence- - Fix cudf_polars pl.col(...).len() always excluding null values (#18849) @mroeschke
- Throw a descriptive exception in Parquet reader when trying to read files with more than two billion rows (#18835) @mhaseeb123
- Skip a decompression test (#18825) @vuule
- Update strings benchmarks to use alloc_size column/table function (#18822) @davidwendt
- Fix host decompression of empty DEFLATE data (#18805) @vuule
- Avoid going OOM in
test_row_limit_exceed_raises
by using dummy array (#18802) @Matt711 - Fix host decompression of empty Snappy data (#18800) @vuule
- Skip test that fails due to polars issue (#18787) @wence-
- Ensure scalar dtype is always set in from_py (#18780) @vyasr
- Fix reading of Snappy compressed Avro files (#18774) @vuule
- Fix missing semicolon in label_bins.cu (#18765) @evanramos-nvidia
- Fix noexcept annotations on strings_column_view (#18763) @wence-
- Fix integer overflows in pylibcudf
from_column_view_of_arbitrary
(#18758) @wence- - Fix overflow case and clean up some logic (#18734) @vyasr
- Link to
nvtx3::nvtx3-cpp
instead ofnvToolsExt
(#18730) @jakirkham - Revise
DaskIntegration
protocol to align withrapidsmpf
(#18720) @rjzamora - Fix
skip_compression
option in the Parquet writer with host compression (#18714) @vuule - Add missing header (#18671) @vyasr
- Revert "Set flag to always use unsafe atomic storage" (#18657) @PointKernel
- Fix optional operator* called on a disengaged value in clamp.cu (#18655) @davidwendt
- Add missing header to host_memory.cpp (#18649) @alliepiper
- Fix device compression when writing Parquet files without using nvCOMP (#18644) @vuule
- Add CUDA_ARCHITECTURES setting to cpp-linters script (#18637) @davidwendt
- Pin to cython<3.1 (#18617) @wence-
- Fix
DataFrame.memory_usage
output order (#18595) @mroeschke - Set flag to always use unsafe atomic storage (#18590) @PointKernel
- Update KvikIO S3 endpoint usage (#18565) @kingcrimsontianyu
- Skip cuml third-party integration tests that may segfault (#18561) @Matt711
- Allow .iloc with cuDF objects as column indexers (#18558) @mroeschke
- Fix overflow for
MERGE_M2
groupby aggregation (#18546) @ttnghia - Add back cudf root (#18544) @vyasr
- Change default memory resource for 'distributed' cudf-polars (#18531) @rjzamora
- Fix copy-on-write buffer separation and cleanup (#18530) @galipremsagar
- Fix cpp examples cmake to use the rapids_config.cmake (#18501) @davidwendt
- Rename rapidsmp to rapidsmpf (#18493) @rjzamora
- Fix compilation with the C++20 standard (#18486) @vuule
- Fix an error when reading some compressed Parquet V2 files (#18478) @vuule
- Support title-case characters in strings capitalize() and title() APIs (#18457) @davidwendt
- Ensure DataFrame column label operations reset label_dtype (#18452) @mroeschke
- Fix a segfault when reading a Parquet file with unsupported compression type (#18451) @vuule
- Fix logger macros (#18444) @vyasr
- Fix auto-detection of compression type in host-side decompression (#18440) @shrshi
- Use delete not free to release data allocated with new (#18412) @wence-
- Fix synchronization issues in host compression and decompression (#18395) @vuule
- Update Dask array-conversion handling (#18382) @rjzamora
- Fixed indexing on empty DataFrame with no columns (#18381) @TomAugspurger
- Deterministic hashing for DataFrameScan nodes in cudf-polars multi-partition executor (#18351) @TomAugspurger
- Fix index of right table in unary operators in AST, in Joins (#18333) @karthikeyann
- Add offsetalator to contiguous-split (#18312) @davidwendt
- Support large strings in nvtext vocabulary-tokenizer (#18283) @davidwendt
- Handle empty aggregations in multi-partition cudf.polars group_by (#18277) @TomAugspurger
📖 Documentation
- Docs for streaming executor options (#18934) @quasiben
- Fix some duplicate toctree issues and improve groupby docs (#18580) @vyasr
- [DOC] Running libcudf benchmarks and comparing output results (#18548) @Matt711
- Fix doxygen usage of the contraction for it is (#18517) @davidwendt
- Clarify @brief tag as description/title on documentation guide (#18515) @davidwendt
- [DOC] Improve clarity in parquet APIs set_row_groups and set_columns parquet (#18466) @Matt711
- Add a usage page to cudf-polars documentation (#18460) @Matt711
- [DOC] Fix typo in CONTRIBUTING.md on build type tests (#18456) @JigaoLuo
- improve docs related to documentation contribution (#18418) @ncclementi
- Add restart kernel note in cudf pandas docs (#18374) @ncclementi
🚀 New Features
- Add CLI argument to enable RMM async memory resource in PDS-H (#18899) @pentschev
- Scan a headerless CSV file with column names provided (#18816) @Matt711
- Add fast paths for
DataFrame.to_cupy
(#18801) @Matt711 - Require
numba-cuda>=0.11.0
(#18770) @brandon-b-miller - Create a pylibcudf Column from a python iterable (#18768) @Matt711
- Support
ConditianalJoin
via broadcasting in cudf-polars streaming engine (#18723) @rjzamora - Experimental PQ reader utility to calculate total rows in input row groups (#18716) @mhaseeb123
- Extend
explain_query
to support printing the logical plan (pre lowered plan) (#18708) @Matt711 - Reuse
libcudf
dependencies for Java JNI build when they are available (#18682) @ttnghia - Add alloc_size member function to cudf::column and cudf::table (#18639) @davidwendt
- Print the physical cudf-polars plan in
pdsh.py
(#18635) @rjzamora - String Transform Examples (#18616) @lamarrr
- Add streaming support for
group_by -> n_unique
to cudf-polars (#18606) @rjzamora - Export cudf compiler flags and definitions (#18604) @ttnghia
- Implement
BIT_COUNT
unary operation (#18589) @ttnghia - Expose column chunk metadata in
read_parquet_metadata()
(#18579) @mhaseeb123 - Add APIs to check ORC and Parquet compression support at runtime (#18578) @vuule
- Add
Distinct
support to the cudf-polars streaming executor (#18576) @rjzamora - Add support for large list host Arrow data conversion (#18562) @vyasr
- Implement
BITWISE_AGG
aggregations (bitwiseAND
,OR
andXOR
) for sort-based groupby and reduction (#18551) @ttnghia - Implement row group pruning with bloom filters in experimental PQ reader (#18545) @mhaseeb123
- Implement row group pruning with stats in experimental PQ reader (#18543) @mhaseeb123
- [JNI] Expose row-wise sha1 api (#18540) @warrickhe
- Add
Sort
+head/tail
support to streaming cudf-polars executor (#18538) @rjzamora - Add multi-partition MapFunction support to cudf-polars (#18523) @rjzamora
- Adds support for writing raw UTF-8 characters (without escaping) in the JSON writer (#18508) @Matt711
- Support reading from device buffers in the pylibcudf IO APIs (#18496) @Matt711
- Support multi-partition
Select
operations with aggregations (#18492) @rjzamora - Implemented String Output & User-data Support for Transforms (#18490) @lamarrr
- Add a utility to bulk set multiple null masks (#18489) @mhaseeb123
- High level interface for experimental PQ reader and implementation of metadata APIs (#18480) @mhaseeb123
- Added
pylibcudf.utilities.is_ptds_enabled
(#18467) @TomAugspurger - Add a public API for copying a table_view to device array (#18450) @Matt711
- Support
cudf-polars
cast_time_unit
(#18442) @brandon-b-miller - Support creating a pylibcudf Column from a host array (#18425) @Matt711
- Move parquet schema types and structs to public headers (#18424) @mhaseeb123
- Add optional dtype argument to
Scalar.from_any
(#18415) @Matt711 - Expose
cudf::chunked_pack
in pylibcudf (#18411) @wence- - Add support for long string columns in cudf::contiguous_split (#18393) @nvdbaranec
- Implemented String Input support for Transforms and Removed
jit::column_device_view
(#18378) @lamarrr - Automatically dispatch between host and device decompression/compression based on the number of buffers (#18363) @vuule
- Expose join hash table load factor (#18361) @PointKernel
- Skip decoding of pages marked as pruned in PQ reader (#18347) @mhaseeb123
- Sort-based inner join for high-multiplicity tables (#18318) @shrshi
- Support constructing pylibcudf Columns and Tables from views into arbitrary objects (#18314) @vyasr
- Add standard data ingestion pipelines to pylibcudf for ndarrays (#18311) @Matt711
- Support
cudf-polars
isoyear
andweek
(isoweek
) (#18265) @brandon-b-miller - Add Keep Option Parameter to Distinct (#18237) @warrickhe
- Add rapidsmp shuffle support to cudf-polars (#18231) @rjzamora
- Support
cudf-polars
strftime
(#18181) @brandon-b-miller - Add benchmark for join operations with low build table cardinality (#18105) @shrshi
- Add nvtext substring deduplication APIs (Part 2) (#18104) @davidwendt
- Support
include_file_paths
in cudf polars (#18057) @Matt711 - Add support for the Arrow device capsule interfaces (#15370) @vyasr
🛠️ Improvements
- use 'rapids-init-pip' in wheel CI, other CI changes (#18902) @jameslamb
- Avoid RecursionError in custreamz test (#18887) @TomAugspurger
- Update NumPy dependency in cudf.pandas-catboost integration test (#18870) @Matt711
- CPU only execution for PDSH (#18869) @quasiben
- Remove more top level cudf imports in core (#18862) @mroeschke
- Remove top level cudf imports in core (#18857) @mroeschke
- Add CUDF_INSTALL_DIR for JAVA build script (#18852) @pxLi
- Call the correct
from_pandas
inhdf
reader (#18850) @galipremsagar - Update
__all__
incudf_polars/dsl/ir.py
(#18848) @Matt711 - Upload examples conda package (#18847) @vyasr
- Add retries to prevent failures in occasionally slow CI runs (#18843) @galipremsagar
- Finish CUDA 12.9 migration and use branch-25.06 workflows (#18839) @bdice
- Remove toplevel
import cudf
from window/tools/join directories (#18833) @mroeschke - Remove toplevel
import cudf
from cudf/io files (#18829) @mroeschke - Update pdsh benchmark script to support explain-only (#18826) @TomAugspurger
- Refactor UDF utils and add a hook to enable NRT when necessary (#18823) @brandon-b-miller
- Fix memory access error in nvtext::edit_distance (#18821) @davidwendt
- Update to clang 20 (#18818) @bdice
- Reduce more data sizes of Python tests (#18814) @mroeschke
- Mark DataFrame.dtypes as an _external_only_api (#18809) @mroeschke
- Change calls to thrust::swap to cuda::std::swap (#18808) @davidwendt
- Move implemented BaseIndex methods over to Index (#18807) @mroeschke
- Improve pandas version fetching script (#18793) @galipremsagar
- Change cudf::sort googlebench benchmarks to nvbench (#18786) @davidwendt
- Only warn in cudf.pandas if rmm mode explicitly set and rmm already configured (#18785) @jcrist
- Quote head_rev in conda recipes (#18784) @bdice
- Move RangeIndex implementation below Index (#18777) @mroeschke
- Remove unecessary _Ravelled class (#18771) @Matt711
- Remove pytest-rerunfailures (#18766) @mroeschke
- Replace from_arrow with direct calls Column/Table constructors in pylibcudf and cudf-polars tests (#18762) @Matt711
- CUDA 12.9 use updated compression flags (#18755) @robertmaynard
- fix(rattler): add
librmm
to host forlibcudf
to fix overlinking error (#18754) @gforsyth - Remove the file name from the output in cudf-polars' explain APIs (#18752) @Matt711
- Remove cudf.BaseIndex (#18751) @mroeschke
- Support creating a pylibcudf Column from a general ndarray (#18744) @Matt711
- Improve lowering of
Distinct
IR nodes for high-cardinality data (#18725) @rjzamora - Simplify Numba-CUDA MVC logic (#18724) @bdice
- Test with CUDA 12.9.0 (#18721) @bdice
- Add more
cudf.Series
microbenchmarks (#18718) @Matt711 - Run unit-tests-cudf-pandas on branch-25.06 for nightly tests (#18717) @davidwendt
- Move
test_large_unique_categories_repr
to benchmarks (#18715) @galipremsagar - Allow
pylibcudf.Column
to consume objects exposing__arrow_c_stream__
(#18712) @mroeschke - Switch from printing to logging (#18711) @vyasr
- Add Python tests for different compression implementations (#18710) @vuule
- Remove redundant xfails in cuml integration tests (#18699) @Matt711
- ci: run unit-tests-cudf-pandas on
branch-25.06
workflow (#18692) @gforsyth - Exclude librmm.so from auditwheel (#18691) @bdice
- Add C++ tests for different compression implementations (#18690) @vuule
- Improve runtime of cuDF Python unit tests (#18689) @mroeschke
- Require at least numba-cuda
0.10.1
(#18688) @brandon-b-miller - Add
nvidia-cuda-{nvrtc, nvcc}
as a dependency for cuDF wheels (#18686) @brandon-b-miller - Support rolling aggregations in in-memory cudf-polars execution (#18681) @wence-
- Replace
parquet_blocksize
withtarget_partition_size
(#18669) @rjzamora - Skip test_large_unique_categories_repr in CI (#18666) @bdice
- Locally import pyarrow.dataset and fsspec for
import cudf
performance (#18663) @mroeschke - Disable
arm64
python tests (#18662) @galipremsagar - Pin numba-cuda>=0.9.0,!=0.10.0 due to CI hangs on ARM (#18661) @mroeschke
- Fix compile warnings in Java JNI (#18660) @ttnghia
- Drop
Empty
nodes from IR graph (#18658) @rjzamora - Add support for Python 3.13 (#18648) @gforsyth
- Cleanup libcudf detail/aggregation.hpp/.cuh (#18642) @davidwendt
- Skip all known pytest failures in pandas-tests (#18641) @galipremsagar
- Preserve partitioning after
Filter
andProjection
in cudf-polars (#18638) @rjzamora - Support quantile in cudf-polars grouped aggregations (#18634) @wence-
- Deprecate Series.nullmask, Series.nullable, Series.from_categorical, Series.from_masked_array, cudf.isclose (#18631) @mroeschke
- Access private objects by importing from module instead of
cudf.core/util
namespace (#18629) @mroeschke - Replace unnecessary cudf::size_of() calls with sizeof() (#18628) @davidwendt
- Improve cold cache dropping (#18626) @kingcrimsontianyu
- Improve default config values for cudf-polars streaming (#18623) @rjzamora
- Add gtest error check for nvtext::wordpiece_tokenize (#18621) @davidwendt
- Polars dataframe serialize using chunked pack (#18614) @madsbk
- xfail all known errors in pandas-test suite (#18612) @galipremsagar
- Add
TemporalBaseColumn
as a parent class toDatetimeColumn
andTimedeltaColumn
(#18611) @mroeschke - Update cudf::cast internal function to use sizeof instead of cudf::size_of (#18607) @davidwendt
- Move cudf/utils/utils.py methods to appropriate locations (#18605) @mroeschke
- pylibcudf.Column: add
device_buffer_size
and register a dask.sizeof function for cudf-polars Column and DataFrame (#18602) @madsbk - Use
cached_property
for Datetime and Timedelta column properties (#18601) @mroeschke - Annotate and simplify
from_arrow
(#18600) @mroeschke - Enable reporting peak memory usage for gtests (#18599) @davidwendt
- Prune methods from Frame that are specific to subclasses (#18597) @mroeschke
- Switch
tensorflow
integration tests to use 12.x (#18596) @galipremsagar - refactor: use
libnvcomp
fromlibkvikio
wheel to unblock Python 3.13 upgrade (#18593) @gforsyth - Add temporary pdsh benchmarks to
cudf_polars.experimental
(#18592) @rjzamora - Update
numba-cuda
dependency to>=0.9.0
(#18591) @brandon-b-miller - use 'certifi' certificates in fetch_pandas_versions script (#18588) @jameslamb
- Add nvtext substring duplication APIs (Part 1) (#18585) @davidwendt
- Bump polars version to <1.29 (#18581) @Matt711
- Allow datetime.timedelta objects in pylibcudf.Scalar.from_py (#18577) @mroeschke
- Rework strings split_helper utility for better reuse (#18575) @davidwendt
- Additional tests strings for strings split APIs (#18574) @davidwendt
- Support datetime.datetime objects in pylibcudf.Scalar.from_py (#18572) @mroeschke
- Store Python scalars instead of PyArrow Scalars in cudf_polars Literal expr (#18563) @mroeschke
- Support
plc.Scalar.from_py(None)
andplc.Scalar.from_py(int, float type)
(#18559) @mroeschke - Add xfail window function tests for cudf_polars (#18557) @btepera
- Add fast paths to
Series.to_cupy
andSeries.values
(#18555) @Matt711 - Reduce cudf-polars pyarrow usage (#18554) @vyasr
- Avoid possible invalid kernel grid error in
cudf::set_null_masks
if no bitmasks to set (#18553) @mhaseeb123 - Adjust cudf Python groupby test for cuCollections update (#18550) @mroeschke
- Refactor scan test I/O logic into shared
make_partitioned_source
helper (#18542) @Matt711 - Download build artifacts from Github for CI jobs (#18539) @VenkateshJaya
- Update hypothesis version (#18537) @galipremsagar
- Make Python testing dependencies more specific to pylibcudf vs cudf (#18535) @mroeschke
- Pin hypothesis<6.131.1 due to performance issues (#18532) @mroeschke
- Deduplicate parquet physical type enums (#18526) @mhaseeb123
- Reduce the number of miscellaenous pandas unit tests run with cudf.pandas (#18524) @mroeschke
- Improve nvtext::tokenize_with_vocabulary performance (#18522) @davidwendt
- Make pylibcudf.Column.from_rmm_buffer a Python staticmethod (#18521) @mroeschke
- Add more short circuit checks for .equals (#18520) @mroeschke
- Add synchronous task scheduler to cudf-polars (#18519) @rjzamora
- Don't fetch dlpack headers when building cuDF Python (#18518) @mroeschke
- Refactor polars configuration (#18516) @TomAugspurger
- Refactor internal strings utility to separate header and definition file (#18514) @davidwendt
- Fix
print()
keyword argument in cudf pandas test (#18513) @trxcllnt - Improve performance of strings split-record on whitespace (#18510) @davidwendt
- Use
cuda::std::iter_value_t
instead of thrust iterator traits (#18509) @miscco - Remove redundant task-graph logic for streaming
GroupBy
(#18507) @rjzamora - Replace
GPU_ARCHS
build variable byCMAKE_CUDA_ARCHITECTURES
(#18506) @ttnghia - Optimize pandas metadata generation to reduce memory pressure (#18505) @galipremsagar
- Replace deprecated host_buffer in favor of host_span in SourceInfo (#18503) @Matt711
- Add pylibcudf.Column.from_rmm_buffer (#18502) @mroeschke
- Replace thrust functors with libcu++ ones (#18500) @miscco
- Rename cudf-polars executors (#18499) @rjzamora
- Remove casting functions in pylibcudf utils (#18497) @Matt711
- Increase wheel size limit. (#18487) @bdice
- Add CategoricalIndex.from_codes (#18485) @mroeschke
- Split join header (#18484) @shrshi
- Fix unspecified behavior involving move semantics and order of evaluation (#18481) @kingcrimsontianyu
- Remove need for to_cudf_compatible_scalar (#18477) @mroeschke
- Rerun flaky pytests in CI (#18476) @galipremsagar
- Vendor RAPIDS.cmake (#18473) @bdice
- Add ARM conda environments. (#18470) @bdice
- Bump polars version to <1.28 (#18469) @Matt711
- Add sink support in cudf_polars (#18468) @mroeschke
- Enable rapidsmpf spilling in cudf-polars (#18461) @madsbk
- Promote Parquet type enums to enum classes (#18441) @mhaseeb123
- Consolidate logic in DataFrame.init for listlike arguments (#18439) @mroeschke
- Update compression formats supported in JSON reader (#18438) @shrshi
- Disabled Jitify Minification (#18436) @lamarrr
- Fix printing decimal128 types that are zero (#18435) @trxcllnt
- Replace direct use of nvCOMP and of its adapter with the higher-level decompression API (#18434) @vuule
- Add more
cudf.DataFrame
constructor pytest benchmarks (#18433) @mroeschke - Test against stable tags for narwhals (#18431) @Matt711
- Refcount-based dropping of cached evaluations in cudf-polars executor (#18430) @wence-
- Replace
Thrust
iterator facilities with libcu++ ones (#18427) @miscco - Remove numpy requirement when converting 2d cuda array interface objects to pylibcudf Columns (#18426) @Matt711
- Share more cudf.Column methods for
indices_of
/isin
(#18423) @mroeschke - Switch the ptr type in gpumemoryview from Py_ssize_t to uintptr_t (#18419) @Matt711
- Add strings::extract_single API (#18417) @davidwendt
- Add to_arrow_host_stringview interop API (#18416) @davidwendt
- Start removal of vector factories with
_sync
suffix by deprecating them and adding versions without the suffix (#18414) @vuule - Allow polars arrow conversion to produce string_view (#18413) @wence-
- Change
dask_cudf.to_parquet
behavior for local filesystems (#18408) @rjzamora - Add rank and label_bin methods to ColumnBase (#18407) @mroeschke
- Improve performance of strings::like for long strings (#18406) @davidwendt
- Automatic single-partition fallback in cudf-polars (#18405) @rjzamora
- Remove
_sync
suffix from hostdevice types (#18404) @vuule - Use owning Arrow types in C++ to expose data to Python (#18402) @vyasr
- add static push and pop methods to NvtxRange (#18401) @zpuller
- Deprecate cudf.Scalar (#18394) @mroeschke
- Bump polars version to <1.27 (#18387) @Matt711
- Branch 25.06 merge 25.04 (#18380) @Matt711
- Silence warning by setting BUILD_SHARED_LIBS (#18371) @vyasr
- Rewrite groupby aggregations in cudf-polars to simplify evaluation (#18369) @wence-
- Pass stream through when taking ownership from libcudf (#18367) @wence-
- Expose new grouped_range_rolling API in pylibcudf (#18365) @wence-
- Avoid patching sort algorithms from CCCL (#18364) @miscco
- Deprecate old nvtext::normalize_characters (#18360) @davidwendt
- refactor(rattler): enable strict channel priority for builds (#18358) @gforsyth
- Optimize
sequences
by introducingmake_offsets_child_column
(#18357) @ustcfy - Decompress all data in a single
decompress_page_data
when reading Parquet input in a single chunk (#18352) @vuule - Moving wheel builds to specified location and uploading build artifacts to Github (#18346) @VenkateshJaya
- Performance improvement for to_lower/to_upper for multi-byte UTF-8 characters (#18345) @davidwendt
- Branch 25.06 merge branch 25.04 (#18344) @vyasr
- Use dask-cuda for cudf-polars experimental testing (#18343) @rjzamora
- Deprecate nvtext subword tokenizer (#18334) @davidwendt
- Remove cudf.Scalar in as_column (#18331) @mroeschke
- Add tests for
cudf.polars
to be able to work on a cpu-only machine (#18327) @galipremsagar - Allow
cudf.DataFrame.from_pylibcudf
to accept apylibcudf.io.TableWithMetadata
(#18319) @mroeschke - Avoid stateful construction in
DataFrame.__init__
(#18306) @mroeschke - Improve the groupby performance for extremely low cardinality (#18290) @PointKernel
- Remove extranous modules from top level cudf namespace (#18287) @mroeschke
- Require type annotations in cudf.polars (#18285) @TomAugspurger
- Removing unnecessary StreamSynchronization in reading (#18279) @JigaoLuo
- Update to CCCL 2.8.x with no CCCL patches (#18235) @bdice
- Reduce register pressure for compute_column_kernel (#18226) @matal-nvidia
- Use the mapped buffer for all read operations in the memory-mapped source; switch default source to the kvikIO one (#18204) @vuule
- Improve test coverage in the catboost integration tests (#18126) @Matt711
- Create file sources in parallel (#18094) @vuule
- Enable
stumpy_distributed
tests (#17969) @galipremsagar - Refactor distinct join to use primitive row operators when proper (#17726) @PointKernel
- Update chunked parquet reader benchmarks (#16543) @sdrp713