Releases: rapidsai/cudf
Releases · rapidsai/cudf
v21.08.00
🚨 Breaking Changes
- Fix a crash in pack() when being handed tables with no columns. (#8697) @nvdbaranec
- Remove unused cudf::strings::create_offsets (#8663) @davidwendt
- Add delimiter parameter to cudf::strings::capitalize() (#8620) @davidwendt
- Change default datetime index resolution to ns to match pandas (#8611) @vyasr
- Add sequence_type parameter to cudf::strings::title function (#8602) @davidwendt
- Add
strings::repeat_strings
API that can repeat each string a different number of times (#8561) @ttnghia - String-to-boolean conversion is different from Pandas (#8549) @skirui-source
- Add accurate hash join size functions (#8453) @PointKernel
- Expose a Decimal32Dtype in cuDF Python (#8438) @skirui-source
- Update dask make_meta changes to be compatible with dask upstream (#8426) @galipremsagar
- Adapt
cudf::scalar
classes to changes inrmm::device_scalar
(#8411) @harrism - Remove special Index class from the general index class hierarchy (#8309) @vyasr
- Add first-class dtype utilities (#8308) @vyasr
- ORC - Support reading multiple orc files/buffers in a single operation (#8142) @jdye64
- Upgrade arrow to 4.0.1 (#7495) @galipremsagar
🐛 Bug Fixes
- Fix
contains
check in string column (#8834) @galipremsagar - Remove unused variable from
row_bit_count_test
. (#8829) @mythrocks - Fixes issue with null struct columns in ORC reader (#8819) @rgsl888prabhu
- Set CMake vars for python/parquet support in libarrow builds (#8808) @vyasr
- Handle empty child columns in row_bit_count() (#8791) @mythrocks
- Revert "Remove cudf unneeded build time requirement of the cuda driver" (#8784) @robertmaynard
- Fix isort error in utils.pyx (#8771) @charlesbluca
- Handle sliced struct/list columns properly in concatenate() bounds checking. (#8760) @nvdbaranec
- Fix issues with
_CPackedColumns.serialize()
handling of host and device data (#8759) @charlesbluca - Fix issues with
MultiIndex
indropna
,stack
&reset_index
(#8753) @galipremsagar - Write pandas extension types to parquet file metadata (#8749) @devavret
- Fix
where
to handleDataFrame
&Series
input combination (#8747) @galipremsagar - Fix
replace
to handle null values correctly (#8744) @galipremsagar - Handle sliced structs properly in pack/contiguous_split. (#8739) @nvdbaranec
- Fix issue in slice() where columns with a positive offset were computing null counts incorrectly. (#8738) @nvdbaranec
- Fix
cudf.Series
constructor to handle list of sequences (#8735) @galipremsagar - Fix min/max sorted groupby aggregation on string column with nulls (argmin, argmax sentinel value missing on nulls) (#8731) @karthikeyann
- Fix orc reader assert on create data_type in debug (#8706) @davidwendt
- Fix min/max inclusive cudf::scan for strings column (#8705) @davidwendt
- JNI: Fix driver version assertion logic in testGetCudaRuntimeInfo (#8701) @sperlingxx
- Adding fix for skip_rows and crash in orc reader (#8700) @rgsl888prabhu
- Bug fix:
replace_nulls_policy
functor not returning correct indices for gathermap (#8699) @isVoid - Fix a crash in pack() when being handed tables with no columns. (#8697) @nvdbaranec
- Add post-processing steps to
dask_cudf.groupby.CudfSeriesGroupby.aggregate
(#8694) @charlesbluca - JNI build no longer looks for Arrow in conda environment (#8686) @jlowe
- Handle arbitrarily different data in null list column rows when checking for equivalency. (#8666) @nvdbaranec
- Add ConfigureNVBench to avoid concurrent main() entry points (#8662) @PointKernel
- Pin
*arrow
to use*cuda
inrun
(#8651) @jakirkham - Add proper support for tolerances in testing methods. (#8649) @vyasr
- Support multi-char case conversion in capitalize function (#8647) @davidwendt
- Fix repeated mangled names in read_csv with duplicate column names (#8645) @karthikeyann
- Temporarily disable libcudf example build tests (#8642) @isVoid
- Use conda-sourced cudf artifacts for libcudf example in CI (#8638) @isVoid
- Ensure dev environment uses Arrow GPU packages (#8637) @charlesbluca
- Fix bug that columns only initialized once when specified
columns
andindex
in dataframe ctor (#8628) @isVoid - Propagate **kwargs through to as_*_column methods (#8618) @shwina
- Fix orc_reader_benchmark.cpp compile error (#8609) @davidwendt
- Fix missed renumbering of Aggregation values (#8600) @revans2
- Update cmake to 3.20.5 in the Java Docker image (#8593) @NvTimLiu
- Fix bug in replace_with_backrefs when group has greedy quantifier (#8575) @davidwendt
- Apply metadata to keys before returning in
Frame._encode
(#8560) @charlesbluca - Fix for strings containing special JSON characters in get_json_object(). (#8556) @nvdbaranec
- Fix debug compile error in gather_struct_tests.cpp (#8554) @davidwendt
- String-to-boolean conversion is different from Pandas (#8549) @skirui-source
- Fix
__repr__
output withdisplay.max_rows
isNone
(#8547) @galipremsagar - Fix size passed to column constructors in _with_type_metadata (#8539) @shwina
- Properly retrieve last column when
-1
is specified for column index (#8529) @isVoid - Fix importing
apply
fromdask
(#8517) @galipremsagar - Fix offset of the string dictionary length stream (#8515) @vuule
- Fix double counting of selected columns in CSV reader (#8508) @ochan1
- Incorrect map size in scatter_to_gather corrupts struct columns (#8507) @gerashegalov
- replace_nulls properly propagates memory resource to gather calls (#8500) @robertmaynard
- Disallow groupby aggs for
StructColumns
(#8499) @charlesbluca - Fixes out-of-bounds access for small files in unzip (#8498) @elstehle
- Adding support for writing empty dataframe (#8490) @shaneding
- Fix exclusive scan when including nulls and improve testing (#8478) @harrism
- Add workaround for crash in libcudf debug build using output_indexalator in thrust::lower_bound (#8432) @davidwendt
- Install only the same Thrust files that Thrust itself installs (#8420) @robertmaynard
- Add nightly version for ucx-py in ci script (#8419) @galipremsagar
- Fix null_equality config of rolling_collect_set (#8415) @sperlingxx
- CollectSetAggregation: implement RollingAggregation interface (#8406) @sperlingxx
- Handle pre-sliced nested columns in contiguous_split. (#8391) @nvdbaranec
- Fix bitmask_tests.cpp host accessing device memory (#8370) @davidwendt
- Fix concurrent_unordered_map to prevent accessing padding bits in pair_type (#8348) @davidwendt
- BUG FIX: Raise appropriate strings error when concatenating strings column (#8290) @skirui-source
- Make gpuCI and pre-commit style configurations consistent (#8215) @charlesbluca
- Add collect list to dask-cudf groupby aggregations (#8045) @charlesbluca
📖 Documentation
- Update Python UDFs notebook (#8810) @brandon-b-miller
- Fix dask.dataframe API docs links after reorg (#8772) @jsignell
- Fix instructions for running cuDF/dask-cuDF tests in CONTRIBUTING.md (#8724) @shwina
- Translate Markdown documentation to rST and remove recommonmark (#8698) @vyasr
- Fixed spelling mistakes in libcudf documentation (#8664) @karthikeyann
- Custom Sphinx Extension:
PandasCompat
(#8643) @isVoid - Fix README.md (#8535) @ajschmidt8
- Change namespace contains_nulls to struct (#8523) @davidwendt
- Add info about NVTX ranges to dev guide (#8461) @jrhemstad
- Fixed documentation bug in groupby agg method (#8325) @ahmet-uyar
🚀 New Features
- Fix concatenating structs (#8811) @shaneding
- Implement JNI for groupby aggregations
M2
andMERGE_M2
(#8763) @ttnghia - Bump
isort
to5.6.4
and removeisort
overrides made for 5.0.7 (#8755) @charlesbluca - Implement
__setitem__
forStructColumn
(#8737) @shaneding - Add
is_leap_year
toDateTimeProperties
andDatetimeIndex
(#8736) @isVoid - Add
struct.explode()
method (#8729) @shwina - Add
DataFrame.to_struct()
method to convert a DataFrame to a struct Series (#8728) @shwina - Add support for list type in ORC writer (#8723) @vuule
- Fix slicing from struct columns and accessing struct columns (#8719) @shaneding
- Add
datetime::is_leap_year
(#8711) @isVoid - Accessing struct columns from
dask_cudf
(#8675) @shaneding - Added pct_change to Series (#8650) @TravisHester
- Add strings support to cudf::shift function (#8648) @davidwendt
- Support Scatter
struct_scalar
(#8630) @isVoid - Struct scalar from host dictionary (#8629) @shaneding
- Add dayofyear and day_of_year to Series, DatetimeColumn, and DatetimeIndex (#8626) @beckernick
- JNI support for capitalize (#8624) @firestarman
- Add delimiter parameter to cudf::strings::capitalize() (#8620) @davidwendt
- Add NVBench in CMake (#8619) @PointKernel
- Change default datetime index resolution to ns to match pandas (#8611) @vyasr
- ListColumn
__setitem__
(#8606) @brandon-b-miller - Implement groupby aggregations
M2
andMERGE_M2
(#8605) @ttnghia - Add sequence_type parameter to cudf::strings::title function (#8602) @davidwendt
- Adding support for list and struct type in ORC Reader (#8599) @rgsl888prabhu
- Benchmark for
strings::repeat_strings
APIs (#8589) @ttnghia - Nested scalar support for copy if else (#8588) @gerashegalov
- User specified decimal columns to float64 (#8587) @jdye64
- Add
get_element
for struct column (#8578) @isVoid - Python changes for adding
__getitem__
forstruct
(#8577) @shaneding - Add
strings::repeat_strings
API that can repeat each string a different number of times (#8561) @ttnghia - Refactor
tests/iterator_utilities.hpp
functions (#8540) @ttnghia - Support MERGE_LISTS and MERGE_SETS in Java package (#8516) @sperlingxx
- Decimal support csv reader (#8511) @elstehle
- Add column type tests (#8505) @isVoid
- Warn when downscaling decimal columns (#8492) @ChrisJar
- Add JNI for
strings::repeat_strings
(#8491) @ttnghia - Add
Index.get_loc
for Numerical, String Index support (#8489) @isVoid - Expose half_up rounding in cuDF (#8477) @shwina
- Java APIs to fetch CUDA runtime info (#8465) @sperlingxx
- Add
str.edit_distance_matrix
(#8463) @isVoid - Support const...
v21.06.01
v21.06.00
🚨 Breaking Changes
- Add support for
make_meta_obj
dispatch indask-cudf
(#8342) @galipremsagar - Add separator-on-null parameter to strings concatenate APIs (#8282) @davidwendt
- Introduce a common parent class for NumericalColumn and DecimalColumn (#8278) @vyasr
- Update ORC statistics API to use C++17 standard library (#8241) @vuule
- Preserve column hierarchy when getting NULL row from
LIST
column (#8206) @isVoid Groupby.shift
c++ API refactor and python binding (#8131) @isVoid
🐛 Bug Fixes
- Fix struct flattening to add a validity column only when the input column has null element (#8374) @ttnghia
- Compilation fix: Remove redefinition for
std::is_same_v()
(#8369) @mythrocks - Add backward compatibility for
dask-cudf
to work with other versions ofdask
(#8368) @galipremsagar - Handle empty results with nested types in copy_if_else (#8359) @nvdbaranec
- Handle nested column types properly for empty parquet files. (#8350) @nvdbaranec
- Raise error when unsupported arguments are passed to
dask_cudf.DataFrame.sort_values
(#8349) @galipremsagar - Raise
NotImplementedError
for axis=1 inrank
(#8347) @galipremsagar - Add support for
make_meta_obj
dispatch indask-cudf
(#8342) @galipremsagar - Update Java string concatenate test for single column (#8330) @tgravescs
- Use empty_like in scatter (#8314) @revans2
- Fix concatenate_lists_ignore_null on rows of all_nulls (#8312) @sperlingxx
- Add separator-on-null parameter to strings concatenate APIs (#8282) @davidwendt
- COLLECT_LIST support returning empty output columns. (#8279) @mythrocks
- Update io util to convert path like object to string (#8275) @ayushdg
- Fix result column types for empty inputs to rolling window (#8274) @mythrocks
- Actually test equality in assert_groupby_results_equal (#8272) @shwina
- CMake always explicitly specify a source files extension (#8270) @robertmaynard
- Fix struct binary search and struct flattening (#8268) @ttnghia
- Revert "patch thrust to fix intmax num elements limitation in scan_by_key" (#8263) @cwharris
- upgrade dlpack to 0.5 (#8262) @cwharris
- Fixes CSV-reader type inference for thousands separator and decimal point (#8261) @elstehle
- Fix incorrect assertion in Java concat (#8258) @sperlingxx
- Copy nested types upon construction (#8244) @isVoid
- Preserve column hierarchy when getting NULL row from
LIST
column (#8206) @isVoid - Clip decimal binary op precision at max precision (#8194) @ChrisJar
📖 Documentation
- Add docstring for
dask_cudf.read_csv
(#8355) @galipremsagar - Fix cudf release version in readme (#8331) @galipremsagar
- Fix structs column description in dev docs (#8318) @isVoid
- Update readme with correct CUDA versions (#8315) @raydouglass
- Add description of the cuIO GDS integration (#8293) @vuule
- Remove unused parameter from copy_partition kernel documentation (#8283) @robertmaynard
🚀 New Features
- Add support merging b/w categorical data (#8332) @galipremsagar
- Java: Support struct scalar (#8327) @sperlingxx
- added _is_homogeneous property (#8299) @shaneding
- Added decimal writing for CSV writer (#8296) @kaatish
- Java: Support creating a scalar from utf8 string (#8294) @firestarman
- Add Java API for Concatenate strings with separator (#8289) @tgravescs
strings::join_list_elements
options for empty list inputs (#8285) @ttnghia- Return python lists for getitem calls to list type series (#8265) @brandon-b-miller
- add unit tests for lead/lag on list for row window (#8259) @wbo4958
- Create a String column from UTF8 String byte arrays (#8257) @firestarman
- Support scattering
list_scalar
(#8256) @isVoid - Implement
lists::concatenate_list_elements
(#8231) @ttnghia - Support for struct scalars. (#8220) @nvdbaranec
- Add support for decimal types in ORC writer (#8198) @vuule
- Support create lists column from a
list_scalar
(#8185) @isVoid Groupby.shift
c++ API refactor and python binding (#8131) @isVoid- Add
groupby::replace_nulls(replace_policy)
api (#7118) @isVoid
🛠️ Improvements
- Support Dask + Distributed 2021.05.1 (#8392) @jakirkham
- Add aliases for string methods (#8353) @shwina
- Update environment variable used to determine
cuda_version
(#8321) @ajschmidt8 - JNI: Refactor the code of making column from scalar (#8310) @firestarman
- Update
CHANGELOG.md
links for calver (#8303) @ajschmidt8 - Merge
branch-0.19
intobranch-21.06
(#8302) @ajschmidt8 - use address and length for GDS reads/writes (#8301) @rongou
- Update cudfjni version to 21.06.0 (#8292) @pxLi
- Update docs build script (#8284) @ajschmidt8
- Make device_buffer streams explicit and enforce move construction (#8280) @harrism
- Introduce a common parent class for NumericalColumn and DecimalColumn (#8278) @vyasr
- Do not add nulls to the hash table when null_equality::NOT_EQUAL is passed to left_semi_join and left_anti_join (#8277) @nvdbaranec
- Enable implicit casting when concatenating mixed types (#8276) @ChrisJar
- Fix CMake FindPackage rmm, pin dev envs' dlpack to v0.3 (#8271) @trxcllnt
- Update cudfjni version to 21.06 (#8267) @pxLi
- support RMM aligned resource adapter in JNI (#8266) @rongou
- Pass compiler environment variables to conda python build (#8260) @Ethyling
- Remove abc inheritance from Serializable (#8254) @vyasr
- Move more methods into SingleColumnFrame (#8253) @vyasr
- Update ORC statistics API to use C++17 standard library (#8241) @vuule
- Correct unused parameter warnings in dictonary algorithms (#8239) @robertmaynard
- Correct unused parameters in the copying algorithms (#8232) @robertmaynard
- IO statistics cleanup (#8191) @kaatish
- Refactor of rolling_window implementation. (#8158) @nvdbaranec
- Add a flag for allowing single quotes in JSON strings. (#8144) @nvdbaranec
- Column refactoring 2 (#8130) @vyasr
- support space in workspace (#7956) @jolorunyomi
- Support collect_set on rolling window (#7881) @sperlingxx
v0.19.2
🚨 Breaking Changes
- Allow hash_partition to take a seed value (#7771) @magnatelee
- Allow merging index column with data column using keyword "on" (#7736) @skirui-source
- Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
- Replace device_vector with device_uvector in null_mask (#7715) @harrism
- Don't identify decimals as strings. (#7710) @vyasr
- Fix Java Parquet write after writer API changes (#7655) @revans2
- Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
- Update missing docstring examples in python public APIs (#7546) @galipremsagar
- Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
- Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
- Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
- Add struct support to parquet writer (#7461) @devavret
- Join APIs that return gathermaps (#7454) @shwina
fixed_point
+cudf::binary_operation
API Changes (#7435) @codereport- Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
- Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
- Refactor strings column factories (#7397) @harrism
- Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
- Upgrade pandas to 1.2 (#7375) @galipremsagar
- Rename
logical_cast
tobit_cast
and allow additional conversions (#7373) @ttnghia - Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
🐛 Bug Fixes
- unsnap: busy wait a number of cycles (#8073) @vuule
- Fix returned column type when extracting from an empty list column (#8031) @jlowe
- Don't reindex an new value on setitem if the original dataframe was empty (#8026) @vyasr
- Fix a
NameError
in meta dispatch API (#7996) @galipremsagar - Reindex in
DataFrame.__setitem__
(#7957) @galipremsagar - jitify direct-to-cubin compilation and caching. (#7919) @cwharris
- Use dynamic cudart for nvcomp in java build (#7896) @abellina
- fix "incompatible redefinition" warnings (#7894) @cwharris
- cudf consistently specifies the cuda runtime (#7887) @robertmaynard
- disable verbose output for jitify_preprocess (#7886) @cwharris
- CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
- Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
- cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
- Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
- Sort by index in groupby tests more consistently (#7802) @shwina
- Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
- Add decimal column handling in copy_type_metadata (#7788) @shwina
- Add column names validation in parquet writer (#7786) @galipremsagar
- Fix Java explode outer unit tests (#7782) @jlowe
- Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
- User resource fix for replace_nulls (#7769) @magnatelee
- Fix type dispatch for columnar replace_nulls (#7768) @jlowe
- Add
ignore_order
parameter to dask-cudf concat dispatch (#7765) @galipremsagar - Fix slicing and arrow representations of decimal columns (#7755) @vyasr
- Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
- Implement scatter for struct columns (#7752) @ttnghia
- Fix data corruption in string columns (#7746) @galipremsagar
- Fix string length in stripe dictionary building (#7744) @kaatish
- Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
- Enable dask dispatch to cuDF's
is_categorical_dtype
for cuDF objects (#7740) @brandon-b-miller - Fix dictionary size computation in ORC writer (#7737) @vuule
- Fix
cudf::cast
overflow fordecimal64
toint32_t
or smaller in certain cases (#7733) @codereport - Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
- Disable column_view data accessors for unsupported types (#7725) @jrhemstad
- Materialize
RangeIndex
whenindex=True
in parquet writer (#7711) @galipremsagar - Don't identify decimals as strings. (#7710) @vyasr
- Fix return type of
DataFrame.argsort
(#7706) @galipremsagar - Fix/correct cudf installed package requirements (#7688) @robertmaynard
- Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
- Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
- Fix Java Parquet write after writer API changes (#7655) @revans2
- Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
- Fix internal compiler error during JNI Docker build (#7645) @jlowe
- Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
- Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
- Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
- Fix specifying GPU architecture in JNI build (#7612) @jlowe
- Fix ORC writer OOM issue (#7605) @vuule
- Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
- Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
- Fix missing Dask imports (#7580) @kkraus14
- CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
- Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
- Fix ORC writer output corruption with string columns (#7565) @vuule
- Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
- FIX Fix Anaconda upload args (#7558) @dillon-cullinan
- Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
- FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
- Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
- Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
- Update missing docstring examples in python public APIs (#7546) @galipremsagar
- Decimal32 Build Fix (#7544) @razajafri
- FIX Retry conda output location (#7540) @dillon-cullinan
- fix missing renames of dask git branches from master to main (#7535) @kkraus14
- Remove detail from device_span (#7533) @rwlee
- Change dask and distributed branch to main (#7532) @dantegd
- Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
- Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
- Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
- Change jit launch to safe_launch (#7510) @devavret
- Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
- Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
- Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
- Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
- Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
- Correctly compile benchmarks (#7485) @robertmaynard
- Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
- Fix
__repr__
for categorical dtype (#7476) @galipremsagar - Java cleaner synchronization (#7474) @abellina
- Fix java float/double parsing tests (#7473) @revans2
- Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
- Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
- Missing
device_storage_dispatch
change affectingcudf::gather
(#7449) @codereport - fix cuFile JNI compile errors (#7445) @rongou
- Support
Series.__setitem__
with key to a new row (#7443) @isVoid - Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
- Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
- Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
- Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
- Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
- Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
- Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
- fix Arrow CMake file (#7358) @rongou
- Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
- Handle cupy array in
Dataframe.__setitem__
(#7340) @galipremsagar - Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
- FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan
📖 Documentation
- Fix join API doxygen (#7890) @shwina
- Add Resources to README. (#7697) @bdice
- Add
isin
examples in Docstring (#7479) @galipremsagar - Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
- Fix typo in regex.md doc page (#7363) @davidwendt
- Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe
🚀 New Features
- Enable basic reductions for decimal columns (#7776) @ChrisJar
- Enable join on decimal columns (#7764) @ChrisJar
- Allow merging index column with data column using keyword "on" (#7736) @skirui-source
- Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
- Add support for
unique
groupby aggregation (#7726) @shwina - Expose libcudf's label_bins function to cudf (#7724) @vyasr
- Adding support for equi-join on struct (#7720) @hyperbolic2346
- Add decimal column comparison operations (#7716) @isVoid
- Implement scan operations for decimal columns (#7707) @ChrisJar
- Enable typecasting between decimal and int (#7691) @ChrisJar
- Enable decimal support in parquet writer (#7673) @devavret
- Adds
list.unique
API (#7664) @isVoid - Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
- Add
lists.sort_values
API (#7657) @isVoid - Add is_integer API that can check for the validity of...
v0.19.1
🚨 Breaking Changes
- Allow hash_partition to take a seed value (#7771) @magnatelee
- Allow merging index column with data column using keyword "on" (#7736) @skirui-source
- Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
- Replace device_vector with device_uvector in null_mask (#7715) @harrism
- Don't identify decimals as strings. (#7710) @vyasr
- Fix Java Parquet write after writer API changes (#7655) @revans2
- Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
- Update missing docstring examples in python public APIs (#7546) @galipremsagar
- Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
- Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
- Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
- Add struct support to parquet writer (#7461) @devavret
- Join APIs that return gathermaps (#7454) @shwina
fixed_point
+cudf::binary_operation
API Changes (#7435) @codereport- Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
- Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
- Refactor strings column factories (#7397) @harrism
- Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
- Upgrade pandas to 1.2 (#7375) @galipremsagar
- Rename
logical_cast
tobit_cast
and allow additional conversions (#7373) @ttnghia - Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
🐛 Bug Fixes
- Fix returned column type when extracting from an empty list column (#8031) @jlowe
- Don't reindex an new value on setitem if the original dataframe was empty (#8026) @vyasr
- Fix a
NameError
in meta dispatch API (#7996) @galipremsagar - Reindex in
DataFrame.__setitem__
(#7957) @galipremsagar - jitify direct-to-cubin compilation and caching. (#7919) @cwharris
- Use dynamic cudart for nvcomp in java build (#7896) @abellina
- fix "incompatible redefinition" warnings (#7894) @cwharris
- cudf consistently specifies the cuda runtime (#7887) @robertmaynard
- disable verbose output for jitify_preprocess (#7886) @cwharris
- CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
- Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
- cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
- Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
- Sort by index in groupby tests more consistently (#7802) @shwina
- Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
- Add decimal column handling in copy_type_metadata (#7788) @shwina
- Add column names validation in parquet writer (#7786) @galipremsagar
- Fix Java explode outer unit tests (#7782) @jlowe
- Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
- User resource fix for replace_nulls (#7769) @magnatelee
- Fix type dispatch for columnar replace_nulls (#7768) @jlowe
- Add
ignore_order
parameter to dask-cudf concat dispatch (#7765) @galipremsagar - Fix slicing and arrow representations of decimal columns (#7755) @vyasr
- Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
- Implement scatter for struct columns (#7752) @ttnghia
- Fix data corruption in string columns (#7746) @galipremsagar
- Fix string length in stripe dictionary building (#7744) @kaatish
- Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
- Enable dask dispatch to cuDF's
is_categorical_dtype
for cuDF objects (#7740) @brandon-b-miller - Fix dictionary size computation in ORC writer (#7737) @vuule
- Fix
cudf::cast
overflow fordecimal64
toint32_t
or smaller in certain cases (#7733) @codereport - Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
- Disable column_view data accessors for unsupported types (#7725) @jrhemstad
- Materialize
RangeIndex
whenindex=True
in parquet writer (#7711) @galipremsagar - Don't identify decimals as strings. (#7710) @vyasr
- Fix return type of
DataFrame.argsort
(#7706) @galipremsagar - Fix/correct cudf installed package requirements (#7688) @robertmaynard
- Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
- Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
- Fix Java Parquet write after writer API changes (#7655) @revans2
- Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
- Fix internal compiler error during JNI Docker build (#7645) @jlowe
- Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
- Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
- Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
- Fix specifying GPU architecture in JNI build (#7612) @jlowe
- Fix ORC writer OOM issue (#7605) @vuule
- Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
- Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
- Fix missing Dask imports (#7580) @kkraus14
- CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
- Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
- Fix ORC writer output corruption with string columns (#7565) @vuule
- Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
- FIX Fix Anaconda upload args (#7558) @dillon-cullinan
- Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
- FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
- Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
- Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
- Update missing docstring examples in python public APIs (#7546) @galipremsagar
- Decimal32 Build Fix (#7544) @razajafri
- FIX Retry conda output location (#7540) @dillon-cullinan
- fix missing renames of dask git branches from master to main (#7535) @kkraus14
- Remove detail from device_span (#7533) @rwlee
- Change dask and distributed branch to main (#7532) @dantegd
- Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
- Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
- Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
- Change jit launch to safe_launch (#7510) @devavret
- Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
- Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
- Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
- Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
- Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
- Correctly compile benchmarks (#7485) @robertmaynard
- Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
- Fix
__repr__
for categorical dtype (#7476) @galipremsagar - Java cleaner synchronization (#7474) @abellina
- Fix java float/double parsing tests (#7473) @revans2
- Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
- Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
- Missing
device_storage_dispatch
change affectingcudf::gather
(#7449) @codereport - fix cuFile JNI compile errors (#7445) @rongou
- Support
Series.__setitem__
with key to a new row (#7443) @isVoid - Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
- Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
- Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
- Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
- Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
- Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
- Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
- fix Arrow CMake file (#7358) @rongou
- Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
- Handle cupy array in
Dataframe.__setitem__
(#7340) @galipremsagar - Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
- FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan
📖 Documentation
- Fix join API doxygen (#7890) @shwina
- Add Resources to README. (#7697) @bdice
- Add
isin
examples in Docstring (#7479) @galipremsagar - Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
- Fix typo in regex.md doc page (#7363) @davidwendt
- Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe
🚀 New Features
- Enable basic reductions for decimal columns (#7776) @ChrisJar
- Enable join on decimal columns (#7764) @ChrisJar
- Allow merging index column with data column using keyword "on" (#7736) @skirui-source
- Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
- Add support for
unique
groupby aggregation (#7726) @shwina - Expose libcudf's label_bins function to cudf (#7724) @vyasr
- Adding support for equi-join on struct (#7720) @hyperbolic2346
- Add decimal column comparison operations (#7716) @isVoid
- Implement scan operations for decimal columns (#7707) @ChrisJar
- Enable typecasting between decimal and int (#7691) @ChrisJar
- Enable decimal support in parquet writer (#7673) @devavret
- Adds
list.unique
API (#7664) @isVoid - Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
- Add
lists.sort_values
API (#7657) @isVoid - Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
- Add...
v0.19.0
🚨 Breaking Changes
- Allow hash_partition to take a seed value (#7771) @magnatelee
- Allow merging index column with data column using keyword "on" (#7736) @skirui-source
- Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
- Replace device_vector with device_uvector in null_mask (#7715) @harrism
- Don't identify decimals as strings. (#7710) @vyasr
- Fix Java Parquet write after writer API changes (#7655) @revans2
- Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
- Update missing docstring examples in python public APIs (#7546) @galipremsagar
- Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
- Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
- Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
- Add struct support to parquet writer (#7461) @devavret
- Join APIs that return gathermaps (#7454) @shwina
fixed_point
+cudf::binary_operation
API Changes (#7435) @codereport- Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
- Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
- Refactor strings column factories (#7397) @harrism
- Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
- Upgrade pandas to 1.2 (#7375) @galipremsagar
- Rename
logical_cast
tobit_cast
and allow additional conversions (#7373) @ttnghia - Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
🐛 Bug Fixes
- Fix a
NameError
in meta dispatch API (#7996) @galipremsagar - Reindex in
DataFrame.__setitem__
(#7957) @galipremsagar - jitify direct-to-cubin compilation and caching. (#7919) @cwharris
- Use dynamic cudart for nvcomp in java build (#7896) @abellina
- fix "incompatible redefinition" warnings (#7894) @cwharris
- cudf consistently specifies the cuda runtime (#7887) @robertmaynard
- disable verbose output for jitify_preprocess (#7886) @cwharris
- CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
- Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
- cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
- Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
- Sort by index in groupby tests more consistently (#7802) @shwina
- Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
- Add decimal column handling in copy_type_metadata (#7788) @shwina
- Add column names validation in parquet writer (#7786) @galipremsagar
- Fix Java explode outer unit tests (#7782) @jlowe
- Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
- User resource fix for replace_nulls (#7769) @magnatelee
- Fix type dispatch for columnar replace_nulls (#7768) @jlowe
- Add
ignore_order
parameter to dask-cudf concat dispatch (#7765) @galipremsagar - Fix slicing and arrow representations of decimal columns (#7755) @vyasr
- Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
- Implement scatter for struct columns (#7752) @ttnghia
- Fix data corruption in string columns (#7746) @galipremsagar
- Fix string length in stripe dictionary building (#7744) @kaatish
- Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
- Enable dask dispatch to cuDF's
is_categorical_dtype
for cuDF objects (#7740) @brandon-b-miller - Fix dictionary size computation in ORC writer (#7737) @vuule
- Fix
cudf::cast
overflow fordecimal64
toint32_t
or smaller in certain cases (#7733) @codereport - Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
- Disable column_view data accessors for unsupported types (#7725) @jrhemstad
- Materialize
RangeIndex
whenindex=True
in parquet writer (#7711) @galipremsagar - Don't identify decimals as strings. (#7710) @vyasr
- Fix return type of
DataFrame.argsort
(#7706) @galipremsagar - Fix/correct cudf installed package requirements (#7688) @robertmaynard
- Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
- Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
- Fix Java Parquet write after writer API changes (#7655) @revans2
- Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
- Fix internal compiler error during JNI Docker build (#7645) @jlowe
- Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
- Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
- Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
- Fix specifying GPU architecture in JNI build (#7612) @jlowe
- Fix ORC writer OOM issue (#7605) @vuule
- Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
- Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
- Fix missing Dask imports (#7580) @kkraus14
- CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
- Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
- Fix ORC writer output corruption with string columns (#7565) @vuule
- Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
- FIX Fix Anaconda upload args (#7558) @dillon-cullinan
- Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
- FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
- Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
- Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
- Update missing docstring examples in python public APIs (#7546) @galipremsagar
- Decimal32 Build Fix (#7544) @razajafri
- FIX Retry conda output location (#7540) @dillon-cullinan
- fix missing renames of dask git branches from master to main (#7535) @kkraus14
- Remove detail from device_span (#7533) @rwlee
- Change dask and distributed branch to main (#7532) @dantegd
- Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
- Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
- Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
- Change jit launch to safe_launch (#7510) @devavret
- Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
- Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
- Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
- Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
- Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
- Correctly compile benchmarks (#7485) @robertmaynard
- Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
- Fix
__repr__
for categorical dtype (#7476) @galipremsagar - Java cleaner synchronization (#7474) @abellina
- Fix java float/double parsing tests (#7473) @revans2
- Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
- Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
- Missing
device_storage_dispatch
change affectingcudf::gather
(#7449) @codereport - fix cuFile JNI compile errors (#7445) @rongou
- Support
Series.__setitem__
with key to a new row (#7443) @isVoid - Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
- Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
- Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
- Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
- Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
- Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
- Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
- fix Arrow CMake file (#7358) @rongou
- Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
- Handle cupy array in
Dataframe.__setitem__
(#7340) @galipremsagar - Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
- FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan
📖 Documentation
- Fix join API doxygen (#7890) @shwina
- Add Resources to README. (#7697) @bdice
- Add
isin
examples in Docstring (#7479) @galipremsagar - Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
- Fix typo in regex.md doc page (#7363) @davidwendt
- Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe
🚀 New Features
- Enable basic reductions for decimal columns (#7776) @ChrisJar
- Enable join on decimal columns (#7764) @ChrisJar
- Allow merging index column with data column using keyword "on" (#7736) @skirui-source
- Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
- Add support for
unique
groupby aggregation (#7726) @shwina - Expose libcudf's label_bins function to cudf (#7724) @vyasr
- Adding support for equi-join on struct (#7720) @hyperbolic2346
- Add decimal column comparison operations (#7716) @isVoid
- Implement scan operations for decimal columns (#7707) @ChrisJar
- Enable typecasting between decimal and int (#7691) @ChrisJar
- Enable decimal support in parquet writer (#7673) @devavret
- Adds
list.unique
API (#7664) @isVoid - Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
- Add
lists.sort_values
API (#7657) @isVoid - Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
- Adds
explode
API (#7607) @isVoid - Adds
list.take
, python binding forcudf::lists::segmented_gather
(#7591) @isVoid - Implement cudf::label_bins() (#7554) @vyasr
- Add Python b...
v0.18.1
v0.18.0
Breaking Changes 🚨
- Default
groupby
tosort=False
(#7180) @isVoid - Add libcudf API for parsing of ORC statistics (#7136) @vuule
- Replace ORC writer api with class (#7099) @rgsl888prabhu
- Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
- Replace parquet writer api with class (#7058) @rgsl888prabhu
- Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
- Fix default parameter values of
write_csv
andwrite_parquet
(#6967) @vuule - Align
Series.groupby
API to match Pandas (#6964) @kkraus14 - Share
factorize
implementation with Index and cudf module (#6885) @brandon-b-miller
Bug Fixes 🐛
- Remove incorrect std::move call on return variable (#7319) @davidwendt
- Fix failing CI ORC test (#7313) @vuule
- Disallow constructing frames from a ColumnAccessor (#7298) @shwina
- fix java cuFile tests (#7296) @rongou
- Fix style issues related to NumPy (#7279) @shwina
- Fix bug when
iloc
slice terminates at before-the-zero position (#7277) @isVoid - Fix copying dtype metadata after calling libcudf functions (#7271) @shwina
- Move lists utility function definition out of header (#7266) @mythrocks
- Throw if bool column would cause incorrect result when writing to ORC (#7261) @vuule
- Use
uvector
inreplace_nulls
; Fixsort_helper::grouped_value
doc (#7256) @isVoid - Remove floating point types from cudf::sort fast-path (#7250) @davidwendt
- Disallow picking output columns from nested columns. (#7248) @devavret
- Fix
loc
for Series with a MultiIndex (#7243) @shwina - Fix Arrow column test leaks (#7241) @tgravescs
- Fix test column vector leak (#7238) @kuhushukla
- Fix some bugs in java scalar support for decimal (#7237) @revans2
- Improve
assert_eq
handling of scalar (#7220) @isVoid - Fix missing null_count() comparison in test framework and related failures (#7219) @nvdbaranec
- Remove floating point types from radix sort fast-path (#7215) @davidwendt
- Fixing parquet benchmarks (#7214) @rgsl888prabhu
- Handle various parameter combinations in
replace
API (#7207) @galipremsagar - Export mock aws credentials for s3 tests (#7176) @ayushdg
- Add
MultiIndex.rename
API (#7172) @isVoid - Fix importing list & struct types in
from_arrow
(#7162) @galipremsagar - Fixing parquet precision writing failing if scale is equal to precision (#7146) @hyperbolic2346
- Update s3 tests to use moto_server (#7144) @ayushdg
- Fix JIT cache multi-process test flakiness in slow drives (#7142) @devavret
- Fix compilation errors in libcudf (#7138) @galipremsagar
- Fix compilation failure caused by
-Wall
addition. (#7134) @codereport - Add informative error message for
sep
in CSV writer (#7095) @galipremsagar - Add JIT cache per compute capability (#7090) @devavret
- Implement
__hash__
method for ListDtype (#7081) @galipremsagar - Only upload packages that were built (#7077) @raydouglass
- Fix comparisons between Series and cudf.NA (#7072) @brandon-b-miller
- Handle
nan
values correctly inSeries.one_hot_encoding
(#7059) @galipremsagar - Add
unstack()
support for non-multiindexed dataframes (#7054) @isVoid - Fix
read_orc
for decimal type (#7034) @rgsl888prabhu - Fix backward compatibility of loading a 0.16 pkl file (#7033) @galipremsagar
- Decimal casts in JNI became a NOOP (#7032) @revans2
- Restore usual instance/subclass checking to cudf.DateOffset (#7029) @shwina
- Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
- Fix to_csv delimiter handling of timestamp format (#7023) @davidwendt
- Pin librdkakfa to gcc 7 compatible version (#7021) @raydouglass
- Fix
fillna
&dropna
to also considernp.nan
as a missing value (#7019) @galipremsagar - Fix round operator's HALF_EVEN computation for negative integers (#7014) @nartal1
- Skip Thrust sort patch if already applied (#7009) @harrism
- Fix
cudf::hash_partition
fordecimal32
anddecimal64
(#7006) @codereport - Fix Thrust unroll patch command (#7002) @harrism
- Fix loc behaviour when key of incorrect type is used (#6993) @shwina
- Fix int to datetime conversion in csv_read (#6991) @kaatish
- fix excluding cufile tests by default (#6988) @rongou
- Fix java cufile tests when cufile is not installed (#6987) @revans2
- Make
cudf::round
forfixed_point
whenscale = -decimal_places
a no-op (#6975) @codereport - Fix type comparison for java (#6970) @revans2
- Fix default parameter values of
write_csv
andwrite_parquet
(#6967) @vuule - Align
Series.groupby
API to match Pandas (#6964) @kkraus14 - Fix timestamp parsing in ORC reader for timezones without transitions (#6959) @vuule
- Fix typo in numerical.py (#6957) @rgsl888prabhu
fixed_point_value
double-shifts infixed_point
construction (#6950) @codereport- fix libcu++ include path for jni (#6948) @rongou
- Fix groupby agg/apply behaviour when no key columns are provided (#6945) @shwina
- Avoid inserting null elements into join hash table when nulls are treated as unequal (#6943) @hyperbolic2346
- Fix cudf::merge gtest for dictionary columns (#6942) @davidwendt
- Pass numeric scalars of the same dtype through numeric binops (#6938) @brandon-b-miller
- Fix N/A detection for empty fields in CSV reader (#6922) @vuule
- Fix rmm_mode=managed parameter for gtests (#6912) @davidwendt
- Fix nullmask offset handling in parquet and orc writer (#6889) @kaatish
- Correct the sampling range when sampling with replacement (#6884) @ChrisJar
- Handle nested string columns with no children in contiguous_split. (#6864) @nvdbaranec
- Fix
columns
&index
handling in dataframe constructor (#6838) @galipremsagar
Documentation 📖
- Update readme (#7318) @shwina
- Fix typo in cudf.core.column.string.extract docs (#7253) @adelevie
- Update doxyfile project number (#7161) @davidwendt
- Update 10 minutes to cuDF and CuPy with new APIs (#7158) @ChrisJar
- Cross link RMM & libcudf Doxygen docs (#7149) @ajschmidt8
- Add documentation for support dtypes in all IO formats (#7139) @galipremsagar
- Add groupby docs (#7100) @shwina
- Update cudf python docstrings with new null representation (
<NA>
) (#7050) @galipremsagar - Make Doxygen comments formatting consistent (#7041) @vuule
- Add docs for working with missing data (#7010) @galipremsagar
- Remove warning in from_dlpack and to_dlpack methods (#7001) @miguelusque
- libcudf Developer Guide (#6977) @harrism
- Add JNI wrapper for the cuFile API (GDS) (#6940) @rongou
New Features 🚀
- Support
numeric_only
field forrank()
(#7213) @isVoid - Add support for
cudf::binary_operation
TRUE_DIV
fordecimal32
anddecimal64
(#7198) @codereport - Implement COLLECT rolling window aggregation (#7189) @mythrocks
- Add support for array-like inputs in
cudf.get_dummies
(#7181) @galipremsagar - Default
groupby
tosort=False
(#7180) @isVoid - Add libcudf lists column count_elements API (#7173) @davidwendt
- Implement
cudf::group_by
(sort) fordecimal32
anddecimal64
(#7169) @codereport - Add encoding and compression argument to CSV writer (#7168) @VibhuJawa
cudf::rolling_window
SUM
support fordecimal32
anddecimal64
(#7147) @codereport- Adding support for explode to cuDF (#7140) @hyperbolic2346
- Add libcudf API for parsing of ORC statistics (#7136) @vuule
- update GDS/cuFile location for 0.9 release (#7131) @rongou
- Add Segmented sort (#7122) @karthikeyann
- Add
cudf::binary_operation
NULL_MIN
,NULL_MAX
&NULL_EQUALS
fordecimal32
anddecimal64
(#7119) @codereport - Add
scale
andvalue
methods tofixed_point
(#7109) @codereport - Replace ORC writer api with class (#7099) @rgsl888prabhu
- Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
- Improve
digitize
API (#7071) @isVoid - Add List types support in data generator (#7064) @galipremsagar
cudf::scan
support fordecimal32
anddecimal64
(#7063) @codereportcudf::rolling
ROW_NUMBER
support fordecimal32
anddecimal64
(#7061) @codereport- Replace parquet writer api with class (#7058) @rgsl888prabhu
- Support contains() on lists of primitives (#7039) @mythrocks
- Implement
cudf::rolling
fordecimal32
anddecimal64
(#7037) @codereport - Add
ffill
andbfill
to string columns (#7036) @isVoid - Enable round in cudf for DataFrame and Series (#7022) @ChrisJar
- Extend
replace_nulls_policy
tostring
anddictionary
type (#7004) @isVoid - Add segmented_gather(list_column, gather_list) (#7003) @karthikeyann
- Add
method
field tofillna
for fixed width columns (#6998) @isVoid - Manual merge of branch 0.17 into branch 0.18 (#6995) @shwina
- Implement
cudf::reduce
fordecimal32
anddecimal64
(part 2) (#6980) @codereport - Add Ufunc alias look up for appropriate numpy ufunc dispatching (#6973) @VibhuJawa
- Add pytest-xdist to dev environment.yml (#6958) @galipremsagar
- Add
Index.set_names
api (#6929) @galipremsagar - Add
replace_null
API withreplace_policy
parameter,fixed_width
column support (#6907) @isVoid - Share
factorize
implementation with Index and cudf module (#6885) @brandon-b-miller - Implement update() function (#6883) @skirui-source
- Add groupby idxmin, idxmax aggregation (#6856) @karthikeyann
- Implement
cudf::reduce
fordecimal32
anddecimal64
(part 1) (#6814) @codereport - Implement cudf.DateOffset for months (#6775) @brandon-b-miller
- Add Python DecimalColumn (#6715) @shwina
- Add dictionary support to libcudf groupby functions (#6585) @davidwendt
Improvements 🛠️
- Update stale GHA with exemptions & new labels (#7395) @mike-wendt
- Add GHA to mark issues/prs as stale/rotten (#7388) @Ethyling
- Unpin from numpy < 1.20 (#7335) @shwina
- Prepare Changelog for Automation (#7309) @galipremsagar
- Prepare Changelog for Automation (#7272) @ajschmidt8
- Add JNI support for converting Arrow buffers to CUDF ColumnVectors (#7222) @tgravescs
- Add coverage for
skiprows
andnum_rows
in parquet rea...
v0.17.0
[NIGHTLY] v0.18.0
🔗 Links
🚨 Breaking Changes
- Default
groupby
tosort=False
(#7180) @isVoid - Add libcudf API for parsing of ORC statistics (#7136) @vuule
- Replace ORC writer api with class (#7099) @rgsl888prabhu
- Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
- Replace parquet writer api with class (#7058) @rgsl888prabhu
- Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
- Fix default parameter values of
write_csv
andwrite_parquet
(#6967) @vuule - Align
Series.groupby
API to match Pandas (#6964) @kkraus14 - Share
factorize
implementation with Index and cudf module (#6885) @brandon-b-miller
🐛 Bug Fixes
- Fix null-bounds calculation for ranged window queries (#7568) @mythrocks
- Remove incorrect std::move call on return variable (#7319) @davidwendt
- Fix failing CI ORC test (#7313) @vuule
- Disallow constructing frames from a ColumnAccessor (#7298) @shwina
- fix java cuFile tests (#7296) @rongou
- Fix style issues related to NumPy (#7279) @shwina
- Fix bug when
iloc
slice terminates at before-the-zero position (#7277) @isVoid - Fix copying dtype metadata after calling libcudf functions (#7271) @shwina
- Move lists utility function definition out of header (#7266) @mythrocks
- Throw if bool column would cause incorrect result when writing to ORC (#7261) @vuule
- Use
uvector
inreplace_nulls
; Fixsort_helper::grouped_value
doc (#7256) @isVoid - Remove floating point types from cudf::sort fast-path (#7250) @davidwendt
- Disallow picking output columns from nested columns. (#7248) @devavret
- Fix
loc
for Series with a MultiIndex (#7243) @shwina - Fix Arrow column test leaks (#7241) @tgravescs
- Fix test column vector leak (#7238) @kuhushukla
- Fix some bugs in java scalar support for decimal (#7237) @revans2
- Improve
assert_eq
handling of scalar (#7220) @isVoid - Fix missing null_count() comparison in test framework and related failures (#7219) @nvdbaranec
- Remove floating point types from radix sort fast-path (#7215) @davidwendt
- Fixing parquet benchmarks (#7214) @rgsl888prabhu
- Handle various parameter combinations in
replace
API (#7207) @galipremsagar - Export mock aws credentials for s3 tests (#7176) @ayushdg
- Add
MultiIndex.rename
API (#7172) @isVoid - Fix importing list & struct types in
from_arrow
(#7162) @galipremsagar - Fixing parquet precision writing failing if scale is equal to precision (#7146) @hyperbolic2346
- Update s3 tests to use moto_server (#7144) @ayushdg
- Fix JIT cache multi-process test flakiness in slow drives (#7142) @devavret
- Fix compilation errors in libcudf (#7138) @galipremsagar
- Fix compilation failure caused by
-Wall
addition. (#7134) @codereport - Add informative error message for
sep
in CSV writer (#7095) @galipremsagar - Add JIT cache per compute capability (#7090) @devavret
- Implement
__hash__
method for ListDtype (#7081) @galipremsagar - Only upload packages that were built (#7077) @raydouglass
- Fix comparisons between Series and cudf.NA (#7072) @brandon-b-miller
- Handle
nan
values correctly inSeries.one_hot_encoding
(#7059) @galipremsagar - Add
unstack()
support for non-multiindexed dataframes (#7054) @isVoid - Fix
read_orc
for decimal type (#7034) @rgsl888prabhu - Fix backward compatibility of loading a 0.16 pkl file (#7033) @galipremsagar
- Decimal casts in JNI became a NOOP (#7032) @revans2
- Restore usual instance/subclass checking to cudf.DateOffset (#7029) @shwina
- Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
- Fix to_csv delimiter handling of timestamp format (#7023) @davidwendt
- Pin librdkakfa to gcc 7 compatible version (#7021) @raydouglass
- Fix
fillna
&dropna
to also considernp.nan
as a missing value (#7019) @galipremsagar - Fix round operator's HALF_EVEN computation for negative integers (#7014) @nartal1
- Skip Thrust sort patch if already applied (#7009) @harrism
- Fix
cudf::hash_partition
fordecimal32
anddecimal64
(#7006) @codereport - Fix Thrust unroll patch command (#7002) @harrism
- Fix loc behaviour when key of incorrect type is used (#6993) @shwina
- Fix int to datetime conversion in csv_read (#6991) @kaatish
- fix excluding cufile tests by default (#6988) @rongou
- Fix java cufile tests when cufile is not installed (#6987) @revans2
- Make
cudf::round
forfixed_point
whenscale = -decimal_places
a no-op (#6975) @codereport - Fix type comparison for java (#6970) @revans2
- Fix default parameter values of
write_csv
andwrite_parquet
(#6967) @vuule - Align
Series.groupby
API to match Pandas (#6964) @kkraus14 - Fix timestamp parsing in ORC reader for timezones without transitions (#6959) @vuule
- Fix typo in numerical.py (#6957) @rgsl888prabhu
fixed_point_value
double-shifts infixed_point
construction (#6950) @codereport- fix libcu++ include path for jni (#6948) @rongou
- Fix groupby agg/apply behaviour when no key columns are provided (#6945) @shwina
- Avoid inserting null elements into join hash table when nulls are treated as unequal (#6943) @hyperbolic2346
- Fix cudf::merge gtest for dictionary columns (#6942) @davidwendt
- Pass numeric scalars of the same dtype through numeric binops (#6938) @brandon-b-miller
- Fix N/A detection for empty fields in CSV reader (#6922) @vuule
- Fix rmm_mode=managed parameter for gtests (#6912) @davidwendt
- Fix nullmask offset handling in parquet and orc writer (#6889) @kaatish
- Correct the sampling range when sampling with replacement (#6884) @ChrisJar
- Handle nested string columns with no children in contiguous_split. (#6864) @nvdbaranec
- Fix
columns
&index
handling in dataframe constructor (#6838) @galipremsagar
📖 Documentation
- Update readme (#7318) @shwina
- Fix typo in cudf.core.column.string.extract docs (#7253) @adelevie
- Update doxyfile project number (#7161) @davidwendt
- Update 10 minutes to cuDF and CuPy with new APIs (#7158) @ChrisJar
- Cross link RMM & libcudf Doxygen docs (#7149) @ajschmidt8
- Add documentation for support dtypes in all IO formats (#7139) @galipremsagar
- Add groupby docs (#7100) @shwina
- Update cudf python docstrings with new null representation (
<NA>
) (#7050) @galipremsagar - Make Doxygen comments formatting consistent (#7041) @vuule
- Add docs for working with missing data (#7010) @galipremsagar
- Remove warning in from_dlpack and to_dlpack methods (#7001) @miguelusque
- libcudf Developer Guide (#6977) @harrism
- Add JNI wrapper for the cuFile API (GDS) (#6940) @rongou
🚀 New Features
- Support
numeric_only
field forrank()
(#7213) @isVoid - Add support for
cudf::binary_operation
TRUE_DIV
fordecimal32
anddecimal64
(#7198) @codereport - Implement COLLECT rolling window aggregation (#7189) @mythrocks
- Add support for array-like inputs in
cudf.get_dummies
(#7181) @galipremsagar - Default
groupby
tosort=False
(#7180) @isVoid - Add libcudf lists column count_elements API (#7173) @davidwendt
- Implement
cudf::group_by
(sort) fordecimal32
anddecimal64
(#7169) @codereport - Add encoding and compression argument to CSV writer (#7168) @VibhuJawa
cudf::rolling_window
SUM
support fordecimal32
anddecimal64
(#7147) @codereport- Adding support for explode to cuDF (#7140) @hyperbolic2346
- Add libcudf API for parsing of ORC statistics (#7136) @vuule
- update GDS/cuFile location for 0.9 release (#7131) @rongou
- Add Segmented sort (#7122) @karthikeyann
- Add
cudf::binary_operation
NULL_MIN
,NULL_MAX
&NULL_EQUALS
fordecimal32
anddecimal64
(#7119) @codereport - Add
scale
andvalue
methods tofixed_point
(#7109) @codereport - Replace ORC writer api with class (#7099) @rgsl888prabhu
- Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
- Improve
digitize
API (#7071) @isVoid - Add List types support in data generator (#7064) @galipremsagar
cudf::scan
support fordecimal32
anddecimal64
(#7063) @codereportcudf::rolling
ROW_NUMBER
support fordecimal32
anddecimal64
(#7061) @codereport- Replace parquet writer api with class (#7058) @rgsl888prabhu
- Support contains() on lists of primitives (#7039) @mythrocks
- Implement
cudf::rolling
fordecimal32
anddecimal64
(#7037) @codereport - Add
ffill
andbfill
to string columns (#7036) @isVoid - Enable round in cudf for DataFrame and Series (#7022) @ChrisJar
- Extend
replace_nulls_policy
tostring
anddictionary
type (#7004) @isVoid - Add segmented_gather(list_column, gather_list) (#7003) @karthikeyann
- Add
method
field tofillna
for fixed width columns (#6998) @isVoid - Manual merge of branch 0.17 into branch 0.18 (#6995) @shwina
- Implement
cudf::reduce
fordecimal32
anddecimal64
(part 2) (#6980) @codereport - Add Ufunc alias look up for appropriate numpy ufunc dispatching (#6973) @VibhuJawa
- Add pytest-xdist to dev environment.yml (#6958) @galipremsagar
- Add
Index.set_names
api (#6929) @galipremsagar - Add
replace_null
API withreplace_policy
parameter,fixed_width
column support (#6907) @isVoid - Share
factorize
implementation with Index and cudf module (#6885) @brandon-b-miller - Implement update() function (#6883) @skirui-source
- Add groupby idxmin, idxmax aggregation (#6856) @karthikeyann
- Implement
cudf::reduce
fordecimal32
anddecimal64
(part 1) (#6814) @codereport - Implement cudf.DateOffset for months (#6775) @brandon-b-miller
- Add Python DecimalColumn (#6715) @shwina
- Add dictionary support to libcudf groupby functions (#6585) @davidwendt
🛠️ Improvements
- Update stale GHA with exemptions & new labels (#7395) @mike-wendt
- Add GHA to mark issues/prs as stale/rotten (#7388) @Ethyling
- Unpin from numpy < 1.20 (#7335) @shwina
- Prep...