Releases: pola-rs/polars
Rust Polars 0.50.0
🏆 Highlights
- Make
Selector
a concrete part of the DSL (#23351) - Rework Categorical/Enum to use (Frozen)Categories (#23016)
🚀 Performance improvements
- Lower Expr.slice to streaming engine (#23683)
- Elide bound check (#23653)
- Preserve
Column
repr inColumnTransform
operations (#23648) - Lower any() and all() to streaming engine (#23640)
- Lower row-separable functions in streaming engine (#23633)
- Lower int_range(len()) to with_row_index (#23576)
- Avoid double field resolution in with_columns (#23530)
- Rolling quantile lower time complexity (#23443)
- Use single-key optimization with Categorical (#23436)
- Improve null-preserving identification for boolean functions (#23317)
- Improve boolean bitwise aggregate performance (#23325)
- Enable Parquet expressions and dedup
is_in
values in Parquet predicates (#23293) - Re-write join types during filter pushdown (#23275)
- Generate PQ ZSTD decompression context once (#23200)
- Trigger cache/cse optimizations when multiplexing (#23274)
- Cache FileInfo upon DSL -> IR conversion (#23263)
- Push more filters past joins (#23240)
✨ Enhancements
- Expand on
DataTypeExpr
(#23249) - Lower row-separable functions in streaming engine (#23633)
- Add scalar checks to range expressions (#23632)
- Expose
POLARS_DOT_SVG_VIEWER
to automatically dispatch to SVG viewer (#23592) - Implement mean function in
arr
namespace (#23486) - Implement
vec_hash
forList
andArray
(#23578) - Add unstable
pl.row_index()
expression (#23556) - Add Categories on the Python side (#23543)
- Implement partitioned sinks for the in-memory engine (#23522)
- Expose
IRFunctionExpr::Rank
in the python visitor (#23512) - Raise and Warn on UDF's without
return_dtype
set (#23353) - IR pruning (#23499)
- Expose
IRFunctionExpr::FillNullWithStrategy
in the python visitor (#23479) - Support min/max reducer for null dtype in streaming engine (#23465)
- Implement streaming Categorical/Enum min/max (#23440)
- Allow cast to Categorical inside list.eval (#23432)
- Support
pathlib.Path
as source forread/scan_delta()
(#23411) - Enable default set of
ScanCastOptions
for nativescan_iceberg()
(#23416) - Pass payload in
ExprRegistry
(#23412) - Support reading nanosecond/Int96 timestamps and schema evolved datasets in
scan_delta()
(#23398) - Support row group skipping with filters when
cast_options
is given (#23356) - Execute bitwise reductions in streaming engine (#23321)
- Use
scan_parquet().collect_schema()
forread_parquet_schema
(#23359) - Add dtype to str.to_integer() (#22239)
- Add
arr.slice
,arr.head
andarr.tail
methods toarr
namespace (#23150) - Add
is_close
method (#23273) - Drop superfluous casts from optimized plan (#23269)
- Added
drop_nulls
option toto_dummies
(#23215) - Support comma as decimal separator for CSV write (#23238)
- Don't format keys if they're empty in dot (#23247)
- Improve arity simplification (#23242)
🐞 Bug fixes
- Fix credential refresh logic (#23730)
- Fix
to_datetime()
fallible identification (#23735) - Correct output datatype for
dt.with_time_unit
(#23734) - Fix incorrect native Iceberg scan from tables with renamed/dropped columns/fields (#23713)
- Allow DataType expressions with selectors (#23720)
- Match output type to engine for
interpolate
onDecimal
(#23706) - Remaining bugs in
with_exprs_and_input
and pruning (#23710) - Match output dtype to engine for
cum_sum_horizontal
(#23686) - Field names for
pl.struct
in group-by (#23703) - Fix output for
str.extract_groups
with empty string pattern (#23698) - Match output type to engine for
rolling_map
(#23702) - Fix incorrect join on single Int128 column for in-memory engine (#23694)
- Match output field name to lhs for
BusinessDaycount
(#23679) - Correct the planner output datatype for
strptime
(#23676) - Sort and Scan
with_exprs_and_input
(#23675) - Revert to old behavior with
name.keep
(#23670) - Fix panic loading from arrow
Map
containing timestamps (#23662) - Selectors in
self
part oflist.eval
(#23668) - Fix output field dtype for
ToInteger
(#23664) - Allow
decimal_comma
with,
separator inread_csv
(#23657) - Fix handling of UTF-8 in
write_csv
toIO[str]
(#23647) - Selectors in
{Lazy,Data}Frame.filter
(#23631) - Stop splitfields iterator at eol in simd branch (#23652)
- Correct output datatype of dt.year and dt.mil (#23646)
- Logic of broadcast_rhs in binary functions to correct list.set_intersection for list[str] columns (#23584)
- Order-preserving equi-join didn't always flush final matches (#23639)
- Fix ColumnNotFound error when joining on
col().cast()
(#23622) - Fix agg groups on
when/then
ingroup_by
context (#23628) - Output type for sign (#23572)
- Apply
agg_fn
onnull
values inpivot
(#23586) - Remove nonsensical duration variance (#23621)
- Don't panic when sinking nested categorical to Parquet (#23610)
- Correctly set value count output field name (#23611)
- Casting unused columns in to_torch (#23606)
- Allow inferring of hours-only timezone offset (#23605)
- Bug in Categorical <-> str compare with nulls (#23609)
- Honor
n=0
in all cases ofstr.replace
(#23598) - Remove arbitrary 25 item limit from implicit Python list -> Series infer (#23603)
- Relabel duplicate sequence IDs in distributor (#23593)
- Round-trip Enum and Categorical metadata in plugins (#23588)
- Fix incorrect
join_asof
withby
followed byhead/slice
(#23585) - Allow writing nested Int128 data to Parquet (#23580)
- Enum serialization assert (#23574)
- Output type for peak_min / peak_max (#23573)
- Make Scalar Categorical, Enum and Struct values serializable (#23565)
- Preserve row order within partition when sinking parquet (#23462)
- Panic in
create_multiple_physical_plans
when branching from a single cache node (#23561) - Prevent in-mem partition sink deadlock (#23562)
- Update AWS cloud documentation (#23563)
- Correctly handle null values when comparing structs (#23560)
- Make fold/reduce/cum_reduce/cum_fold serializable (#23524)
- Make
Expr.append
serializable (#23515) - Float by float division dtype (#23529)
- Division on empty DataFrame generating null row (#23516)
- Partition sink
copy_exprs
andwith_exprs_and_input
(#23511) - Unreachable with
pl.self_dtype
(#23507) - Rolling median incorrect min_samples with nulls (#23481)
- Make
Int128
roundtrippable via Parquet (#23494) - Fix panic when common subplans contain IEJoins (#23487)
- Properly handle non-finite floats in rolling_sum/mean (#23482)
- Make
read_csv_batched
respectskip_rows
andskip_lines
(#23484) - Always use
cloudpickle
for the python objects in cloud plans (#23474) - Support string literals in index_of() on categoricals (#23458)
- Don't panic for
finish_callback
with nested datatypes (#23464) - Support min/max aggregation for DataFrame/LazyFrame Categoricals (#23455)
- Fix var/moment dtypes (#23453)
- Fix agg_groups dtype (#23450)
- Clear cached_schema when apply changes dtype (#23439)
- Allow structured conversion to/from numpy with Array types, preserving shape (#23438)
- Null handling in full-null group_by_dynamic mean/sum (#23435)
- Enable default set of
ScanCastOptions
for nativescan_iceberg()
(#23416) - Fix index calculation for
nearest
interpolation (#23418) - Fix compilation failure with
--no-default-features
and--features lazy,strings
(#23384) - Parse parquet footer length into unsigned integer (#23357)
- Fix incorrect results with
group_by
aggregation on empty groups (#23358) - Fix boolean
min()
ingroup_by
aggregation (streaming) (#23344) - Respect data-model in
map_elements
(#23340) - Properly join URI paths in
PlPath
(#23350) - Ignore null values in
bitwise
aggregation on bools (#23324) - Fix panic filtering after left join (#23310)
- Out-of-bounds index in hot hash table (#23311)
- Fix scanning '?' from cloud with
glob=False
(#23304) - Fix filters on inserted columns did not remove rows (#23303)
- Don't ignore return_dtype (#23309)
- Use safe parsing for
get_normal_components
(#23284) - Fix output column names/order of streaming coalesced right-join (#23278)
- Restore
concat_arr
inputs expansion (#23271)
📖 Documentation
- Point the R Polars version on R-multiverse (#23660)
- Update GPU docs for RAPIDS CUDA 11 deprecation (#23620)
- Add page about billing to Polars Cloud user guide (#23564)
- Small user-guide improvement and fixes (#23549)
- Correct note in
from_pandas
about data being cloned (#23552) - Fix a few typos in the "Streaming" section (#23536)
- Update streaming page (#23535)
- Update structure of Polars Cloud documentation (#23496)
- Update when_then in user guide (#23245)
📦 Build system
🛠️ Other improvements
- Remove incorrect
DeletionFilesList::slice
(#23796) - Remove old schema file (#23798)
- Remove Default for StreamingExecutionState (#23729)
- Explicit match to smaller dtypes before cast to Int32 in asof join (#23776)
- Expose
PlPathRef
via polars::prelude (#23754) - Add hashes json (#23758)
- Add
AExpr::is_expr_equal_to
(#23740) - Fix rank test to respect maintain order (#23723)
- IR inputs and exprs iterators (#23722)
- Store more granular schema hashes to reduce merge conflicts (#23709)
- Add assertions for unique ID (#23711)
- Use RelaxedCell in multiscan (#23712)
- Debug assert
ColumnTransform
cast is non-strict (#23717) - Use UUID for UniqueID (#23704)
- Remove scan id (#23697)
- Propagate Iceberg physical ID schema to IR (#23671)
- Remove unused and confusing match arm (#23691)
- Remove unused
ALLOW_GROUP_AWARE
flag (#23690) - Remove unused
evaluate_inline
(#23687) - Remove unused field from
AggregationContext
(#23685) - Remove `nod...
Python Polars 1.32.0
🏆 Highlights
- Make
Selector
a concrete part of the DSL (#23351) - Rework Categorical/Enum to use (Frozen)Categories (#23016)
🚀 Performance improvements
- Lower Expr.slice to streaming engine (#23683)
- Elide bound check (#23653)
- Preserve
Column
repr inColumnTransform
operations (#23648) - Lower any() and all() to streaming engine (#23640)
- Lower row-separable functions in streaming engine (#23633)
- Lower int_range(len()) to with_row_index (#23576)
- Avoid double field resolution in with_columns (#23530)
- Rolling quantile lower time complexity (#23443)
- Use single-key optimization with Categorical (#23436)
- Improve null-preserving identification for boolean functions (#23317)
- Improve boolean bitwise aggregate performance (#23325)
- Enable Parquet expressions and dedup
is_in
values in Parquet predicates (#23293) - Re-write join types during filter pushdown (#23275)
- Generate PQ ZSTD decompression context once (#23200)
- Trigger cache/cse optimizations when multiplexing (#23274)
- Cache FileInfo upon DSL -> IR conversion (#23263)
- Push more filters past joins (#23240)
- Optimize
Bitmap::make_mut
(#23138)
✨ Enhancements
- Add Python-side caching for credentials and provider auto-initialization (#23736)
- Expand on
DataTypeExpr
(#23249) - Lower row-separable functions in streaming engine (#23633)
- Add scalar checks to range expressions (#23632)
- Expose
POLARS_DOT_SVG_VIEWER
to automatically dispatch to SVG viewer (#23592) - Implement mean function in
arr
namespace (#23486) - Implement
vec_hash
forList
andArray
(#23578) - Add unstable
pl.row_index()
expression (#23556) - Add Categories on the Python side (#23543)
- Implement partitioned sinks for the in-memory engine (#23522)
- Raise and Warn on UDF's without
return_dtype
set (#23353) - IR pruning (#23499)
- Support min/max reducer for null dtype in streaming engine (#23465)
- Implement streaming Categorical/Enum min/max (#23440)
- Allow cast to Categorical inside list.eval (#23432)
- Support
pathlib.Path
as source forread/scan_delta()
(#23411) - Enable default set of
ScanCastOptions
for nativescan_iceberg()
(#23416) - Pass payload in
ExprRegistry
(#23412) - Support reading nanosecond/Int96 timestamps and schema evolved datasets in
scan_delta()
(#23398) - Support row group skipping with filters when
cast_options
is given (#23356) - Execute bitwise reductions in streaming engine (#23321)
- Use
scan_parquet().collect_schema()
forread_parquet_schema
(#23359) - Add dtype to str.to_integer() (#22239)
- Add
arr.slice
,arr.head
andarr.tail
methods toarr
namespace (#23150) - Add
is_close
method (#23273) - Drop superfluous casts from optimized plan (#23269)
- Added
drop_nulls
option toto_dummies
(#23215) - Support comma as decimal separator for CSV write (#23238)
- Don't format keys if they're empty in dot (#23247)
- Improve arity simplification (#23242)
- Allow expression input for
length
parameter inpad_start
,pad_end
, andzfill
(#23182)
🐞 Bug fixes
- Load
_expiry_time
from botocoreCredentials
in CredentialProviderAWS (#23753) - Fix credential refresh logic (#23730)
- Fix
to_datetime()
fallible identification (#23735) - Correct output datatype for
dt.with_time_unit
(#23734) - Fix incorrect native Iceberg scan from tables with renamed/dropped columns/fields (#23713)
- Allow DataType expressions with selectors (#23720)
- Match output type to engine for
interpolate
onDecimal
(#23706) - Remaining bugs in
with_exprs_and_input
and pruning (#23710) - Match output dtype to engine for
cum_sum_horizontal
(#23686) - Field names for
pl.struct
in group-by (#23703) - Fix output for
str.extract_groups
with empty string pattern (#23698) - Match output type to engine for
rolling_map
(#23702) - Moved passing
DeltaTable._storage_options
(#23673) - Fix incorrect join on single Int128 column for in-memory engine (#23694)
- Match output field name to lhs for
BusinessDaycount
(#23679) - Correct the planner output datatype for
strptime
(#23676) - Sort and Scan
with_exprs_and_input
(#23675) - Revert to old behavior with
name.keep
(#23670) - Fix panic loading from arrow
Map
containing timestamps (#23662) - Selectors in
self
part oflist.eval
(#23668) - Fix output field dtype for
ToInteger
(#23664) - Allow
decimal_comma
with,
separator inread_csv
(#23657) - Fix handling of UTF-8 in
write_csv
toIO[str]
(#23647) - Selectors in
{Lazy,Data}Frame.filter
(#23631) - Stop splitfields iterator at eol in simd branch (#23652)
- Correct output datatype of dt.year and dt.mil (#23646)
- Logic of broadcast_rhs in binary functions to correct list.set_intersection for list[str] columns (#23584)
- Order-preserving equi-join didn't always flush final matches (#23639)
- Fix ColumnNotFound error when joining on
col().cast()
(#23622) - Fix agg groups on
when/then
ingroup_by
context (#23628) - Output type for sign (#23572)
- Apply
agg_fn
onnull
values inpivot
(#23586) - Remove nonsensical duration variance (#23621)
- Don't panic when sinking nested categorical to Parquet (#23610)
- Correctly set value count output field name (#23611)
- Casting unused columns in to_torch (#23606)
- Allow inferring of hours-only timezone offset (#23605)
- Bug in Categorical <-> str compare with nulls (#23609)
- Honor
n=0
in all cases ofstr.replace
(#23598) - Remove arbitrary 25 item limit from implicit Python list -> Series infer (#23603)
- Relabel duplicate sequence IDs in distributor (#23593)
- Round-trip Enum and Categorical metadata in plugins (#23588)
- Fix incorrect
join_asof
withby
followed byhead/slice
(#23585) - Change return typing of
get_index_type()
fromDataType
toPolarsIntegerType
(#23558) - Allow writing nested Int128 data to Parquet (#23580)
- Enum serialization assert (#23574)
- Output type for peak_min / peak_max (#23573)
- Make Scalar Categorical, Enum and Struct values serializable (#23565)
- Preserve row order within partition when sinking parquet (#23462)
- Prevent in-mem partition sink deadlock (#23562)
- Update AWS cloud documentation (#23563)
- Correctly handle null values when comparing structs (#23560)
- Make fold/reduce/cum_reduce/cum_fold serializable (#23524)
- Make
Expr.append
serializable (#23515) - Float by float division dtype (#23529)
- Division on empty DataFrame generating null row (#23516)
- Partition sink
copy_exprs
andwith_exprs_and_input
(#23511) - Unreachable with
pl.self_dtype
(#23507) - Rolling median incorrect min_samples with nulls (#23481)
- Make
Int128
roundtrippable via Parquet (#23494) - Fix panic when common subplans contain IEJoins (#23487)
- Properly handle non-finite floats in rolling_sum/mean (#23482)
- Make
read_csv_batched
respectskip_rows
andskip_lines
(#23484) - Always use
cloudpickle
for the python objects in cloud plans (#23474) - Support string literals in index_of() on categoricals (#23458)
- Don't panic for
finish_callback
with nested datatypes (#23464) - Pass
DeltaTable._storage_options
if no storage_options are provided (#23456) - Support min/max aggregation for DataFrame/LazyFrame Categoricals (#23455)
- Fix var/moment dtypes (#23453)
- Fix agg_groups dtype (#23450)
- Fix incorrect
_get_path_scheme
(#23444) - Fix missing overload defaults in
read_ods
andtree_format
(#23442) - Clear cached_schema when apply changes dtype (#23439)
- Allow structured conversion to/from numpy with Array types, preserving shape (#23438)
- Null handling in full-null group_by_dynamic mean/sum (#23435)
- Enable default set of
ScanCastOptions
for nativescan_iceberg()
(#23416) - Fix index calculation for
nearest
interpolation (#23418) - Overload for
eager
default inSchema.to_frame
wasFalse
instead ofTrue
(#23413) - Fix
read_excel
overloads so that passinglist[str]
tosheet_name
does not raise (#23388) - Removed special handling for bytes like objects in read_ndjson (#23361)
- Parse parquet footer length into unsigned integer (#23357)
- Fix incorrect results with
group_by
aggregation on empty groups (#23358) - Fix boolean
min()
ingroup_by
aggregation (streaming) (#23344) - Respect data-model in
map_elements
(#23340) - Properly join URI paths in
PlPath
(#23350) - Ignore null values in
bitwise
aggregation on bools (#23324) - Fix panic filtering after left join (#23310)
- Out-of-bounds index in hot hash table (#23311)
- Fix scanning '?' from cloud with
glob=False
(#23304) - Fix filters on inserted columns did not remove rows (#23303)
- Don't ignore return_dtype (#23309)
- Raise error instead of return in Series class (#23301)
- Use safe parsing for
get_normal_components
(#23284) - Fix output column names/order of streaming coalesced right-join (#23278)
- Restore
concat_arr
inputs expansion (#23271) - Expose FieldsMapper (#23232)
- Fix time zone handling in
dt.iso_year
anddt.is_leap_year
(#23125)
📖 Documentation
- Fix
str.replace_many
examples trigger deprecation warning (#23695) - Point the R Polars version on R-multiverse (#23660)
- Update example for writing to cloud storage (#20265)
- Update GPU docs for RAPIDS CUDA 11 deprecation (#23620)
- Add docs of Expr.list.filter and Series.list.filter (#23589)
- Add page about billing to Polars Cloud user guide (#23564)
- Small user-guide improvement and fixes (#23549)
- Correct note in
from_pandas
about data being cloned (#23552) - Fix a few typos in the "Streaming" section (#23536)
- Update streaming page (#23535)
- Update structure of Polars Cloud documentation (#23496)
- Update example code in pandas migration guide (#23403)
- Correct plugins user guide to reflect that teaching
Expr.language
is in a different section (#23377) - Add example of using OR in
join_where
(#23375) - Update when_then in user guide (#23245)
📦 Build system
- Update all rand code (#23387)
🛠️ Other improvements
- Remove unused functions from the rust side (#2...
Python Polars 1.32.0-beta.1
🏆 Highlights
- Make
Selector
a concrete part of the DSL (#23351) - Rework Categorical/Enum to use (Frozen)Categories (#23016)
🚀 Performance improvements
- Lower Expr.slice to streaming engine (#23683)
- Elide bound check (#23653)
- Preserve
Column
repr inColumnTransform
operations (#23648) - Lower any() and all() to streaming engine (#23640)
- Lower row-separable functions in streaming engine (#23633)
- Lower int_range(len()) to with_row_index (#23576)
- Avoid double field resolution in with_columns (#23530)
- Rolling quantile lower time complexity (#23443)
- Use single-key optimization with Categorical (#23436)
- Improve null-preserving identification for boolean functions (#23317)
- Improve boolean bitwise aggregate performance (#23325)
- Enable Parquet expressions and dedup
is_in
values in Parquet predicates (#23293) - Re-write join types during filter pushdown (#23275)
- Generate PQ ZSTD decompression context once (#23200)
- Trigger cache/cse optimizations when multiplexing (#23274)
- Cache FileInfo upon DSL -> IR conversion (#23263)
- Push more filters past joins (#23240)
- Optimize
Bitmap::make_mut
(#23138)
✨ Enhancements
- Add Python-side caching for credentials and provider auto-initialization (#23736)
- Expand on
DataTypeExpr
(#23249) - Lower row-separable functions in streaming engine (#23633)
- Add scalar checks to range expressions (#23632)
- Expose
POLARS_DOT_SVG_VIEWER
to automatically dispatch to SVG viewer (#23592) - Implement mean function in
arr
namespace (#23486) - Implement
vec_hash
forList
andArray
(#23578) - Add unstable
pl.row_index()
expression (#23556) - Add Categories on the Python side (#23543)
- Implement partitioned sinks for the in-memory engine (#23522)
- Raise and Warn on UDF's without
return_dtype
set (#23353) - IR pruning (#23499)
- Support min/max reducer for null dtype in streaming engine (#23465)
- Implement streaming Categorical/Enum min/max (#23440)
- Allow cast to Categorical inside list.eval (#23432)
- Support
pathlib.Path
as source forread/scan_delta()
(#23411) - Enable default set of
ScanCastOptions
for nativescan_iceberg()
(#23416) - Pass payload in
ExprRegistry
(#23412) - Support reading nanosecond/Int96 timestamps and schema evolved datasets in
scan_delta()
(#23398) - Support row group skipping with filters when
cast_options
is given (#23356) - Execute bitwise reductions in streaming engine (#23321)
- Use
scan_parquet().collect_schema()
forread_parquet_schema
(#23359) - Add dtype to str.to_integer() (#22239)
- Add
arr.slice
,arr.head
andarr.tail
methods toarr
namespace (#23150) - Add
is_close
method (#23273) - Drop superfluous casts from optimized plan (#23269)
- Added
drop_nulls
option toto_dummies
(#23215) - Support comma as decimal separator for CSV write (#23238)
- Don't format keys if they're empty in dot (#23247)
- Improve arity simplification (#23242)
- Allow expression input for
length
parameter inpad_start
,pad_end
, andzfill
(#23182)
🐞 Bug fixes
- Load
_expiry_time
from botocoreCredentials
in CredentialProviderAWS (#23753) - Fix credential refresh logic (#23730)
- Fix
to_datetime()
fallible identification (#23735) - Correct output datatype for
dt.with_time_unit
(#23734) - Fix incorrect native Iceberg scan from tables with renamed/dropped columns/fields (#23713)
- Allow DataType expressions with selectors (#23720)
- Match output type to engine for
interpolate
onDecimal
(#23706) - Remaining bugs in
with_exprs_and_input
and pruning (#23710) - Match output dtype to engine for
cum_sum_horizontal
(#23686) - Field names for
pl.struct
in group-by (#23703) - Fix output for
str.extract_groups
with empty string pattern (#23698) - Match output type to engine for
rolling_map
(#23702) - Moved passing
DeltaTable._storage_options
(#23673) - Fix incorrect join on single Int128 column for in-memory engine (#23694)
- Match output field name to lhs for
BusinessDaycount
(#23679) - Correct the planner output datatype for
strptime
(#23676) - Sort and Scan
with_exprs_and_input
(#23675) - Revert to old behavior with
name.keep
(#23670) - Fix panic loading from arrow
Map
containing timestamps (#23662) - Selectors in
self
part oflist.eval
(#23668) - Fix output field dtype for
ToInteger
(#23664) - Allow
decimal_comma
with,
separator inread_csv
(#23657) - Fix handling of UTF-8 in
write_csv
toIO[str]
(#23647) - Selectors in
{Lazy,Data}Frame.filter
(#23631) - Stop splitfields iterator at eol in simd branch (#23652)
- Correct output datatype of dt.year and dt.mil (#23646)
- Logic of broadcast_rhs in binary functions to correct list.set_intersection for list[str] columns (#23584)
- Order-preserving equi-join didn't always flush final matches (#23639)
- Fix ColumnNotFound error when joining on
col().cast()
(#23622) - Fix agg groups on
when/then
ingroup_by
context (#23628) - Output type for sign (#23572)
- Apply
agg_fn
onnull
values inpivot
(#23586) - Remove nonsensical duration variance (#23621)
- Don't panic when sinking nested categorical to Parquet (#23610)
- Correctly set value count output field name (#23611)
- Casting unused columns in to_torch (#23606)
- Allow inferring of hours-only timezone offset (#23605)
- Bug in Categorical <-> str compare with nulls (#23609)
- Honor
n=0
in all cases ofstr.replace
(#23598) - Remove arbitrary 25 item limit from implicit Python list -> Series infer (#23603)
- Relabel duplicate sequence IDs in distributor (#23593)
- Round-trip Enum and Categorical metadata in plugins (#23588)
- Fix incorrect
join_asof
withby
followed byhead/slice
(#23585) - Change return typing of
get_index_type()
fromDataType
toPolarsIntegerType
(#23558) - Allow writing nested Int128 data to Parquet (#23580)
- Enum serialization assert (#23574)
- Output type for peak_min / peak_max (#23573)
- Make Scalar Categorical, Enum and Struct values serializable (#23565)
- Preserve row order within partition when sinking parquet (#23462)
- Prevent in-mem partition sink deadlock (#23562)
- Update AWS cloud documentation (#23563)
- Correctly handle null values when comparing structs (#23560)
- Make fold/reduce/cum_reduce/cum_fold serializable (#23524)
- Make
Expr.append
serializable (#23515) - Float by float division dtype (#23529)
- Division on empty DataFrame generating null row (#23516)
- Partition sink
copy_exprs
andwith_exprs_and_input
(#23511) - Unreachable with
pl.self_dtype
(#23507) - Rolling median incorrect min_samples with nulls (#23481)
- Make
Int128
roundtrippable via Parquet (#23494) - Fix panic when common subplans contain IEJoins (#23487)
- Properly handle non-finite floats in rolling_sum/mean (#23482)
- Make
read_csv_batched
respectskip_rows
andskip_lines
(#23484) - Always use
cloudpickle
for the python objects in cloud plans (#23474) - Support string literals in index_of() on categoricals (#23458)
- Don't panic for
finish_callback
with nested datatypes (#23464) - Pass
DeltaTable._storage_options
if no storage_options are provided (#23456) - Support min/max aggregation for DataFrame/LazyFrame Categoricals (#23455)
- Fix var/moment dtypes (#23453)
- Fix agg_groups dtype (#23450)
- Fix incorrect
_get_path_scheme
(#23444) - Fix missing overload defaults in
read_ods
andtree_format
(#23442) - Clear cached_schema when apply changes dtype (#23439)
- Allow structured conversion to/from numpy with Array types, preserving shape (#23438)
- Null handling in full-null group_by_dynamic mean/sum (#23435)
- Enable default set of
ScanCastOptions
for nativescan_iceberg()
(#23416) - Fix index calculation for
nearest
interpolation (#23418) - Overload for
eager
default inSchema.to_frame
wasFalse
instead ofTrue
(#23413) - Fix
read_excel
overloads so that passinglist[str]
tosheet_name
does not raise (#23388) - Removed special handling for bytes like objects in read_ndjson (#23361)
- Parse parquet footer length into unsigned integer (#23357)
- Fix incorrect results with
group_by
aggregation on empty groups (#23358) - Fix boolean
min()
ingroup_by
aggregation (streaming) (#23344) - Respect data-model in
map_elements
(#23340) - Properly join URI paths in
PlPath
(#23350) - Ignore null values in
bitwise
aggregation on bools (#23324) - Fix panic filtering after left join (#23310)
- Out-of-bounds index in hot hash table (#23311)
- Fix scanning '?' from cloud with
glob=False
(#23304) - Fix filters on inserted columns did not remove rows (#23303)
- Don't ignore return_dtype (#23309)
- Raise error instead of return in Series class (#23301)
- Use safe parsing for
get_normal_components
(#23284) - Fix output column names/order of streaming coalesced right-join (#23278)
- Restore
concat_arr
inputs expansion (#23271) - Expose FieldsMapper (#23232)
- Fix time zone handling in
dt.iso_year
anddt.is_leap_year
(#23125)
📖 Documentation
- Fix
str.replace_many
examples trigger deprecation warning (#23695) - Point the R Polars version on R-multiverse (#23660)
- Update example for writing to cloud storage (#20265)
- Update GPU docs for RAPIDS CUDA 11 deprecation (#23620)
- Add docs of Expr.list.filter and Series.list.filter (#23589)
- Add page about billing to Polars Cloud user guide (#23564)
- Small user-guide improvement and fixes (#23549)
- Correct note in
from_pandas
about data being cloned (#23552) - Fix a few typos in the "Streaming" section (#23536)
- Update streaming page (#23535)
- Update structure of Polars Cloud documentation (#23496)
- Update example code in pandas migration guide (#23403)
- Correct plugins user guide to reflect that teaching
Expr.language
is in a different section (#23377) - Add example of using OR in
join_where
(#23375) - Update when_then in user guide (#23245)
📦 Build system
- Update all rand code (#23387)
🛠️ Other improvements
- Add hashes json (#23758)
- Add `AExpr::is_expr...
Rust Polars 0.49.1
🚀 Performance improvements
- Optimize
Bitmap::make_mut
(#23138)
✨ Enhancements
- Allow expression input for
length
parameter inpad_start
,pad_end
, andzfill
(#23182)
🐞 Bug fixes
📖 Documentation
- Update when_then in user guide (#23245)
🛠️ Other improvements
- Connect Python
assert_dataframe_equal()
to Rust back-end (#23207) - Fix time zone handling in
dt.iso_year
anddt.is_leap_year
(#23125) - Update Rust Polars versions (#23229)
Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @mcrumiller, @mrkn, @stijnherfst and @zyctree
Rust Polars 0.49.0
💥 Breaking changes
- Remove old streaming engine (#23103)
🚀 Performance improvements
- Improve streaming groupby CSE (#23092)
- Move row index materialization in post-apply to occur after slicing (#22995)
- Add first_(true|false)_idx to BooleanChunked and use in bool arg_(min|max) (#22907)
- Don't go through row encoding for most types on
index_of
(#22903) - Optimise low-level
null
scans andarg_max
for bools (when chunked) (#22897) - Optimize multiscan performance (#22886)
✨ Enhancements
- Native implementation for Iceberg positional deletes (#23091)
- Remove old streaming engine (#23103)
- Make match_chunks public (#23101)
- Implement StructFunction expressions in into_py (#23022)
- Basic implementation of
DataTypeExpr
in Rust DSL (#23049) - Add
required: bool
toParquetFieldOverwrites
(#23013) - Support serializing
name.map_fields
(#22997) - Support serializing
Expr::RenameAlias
(#22988) - Remove duplicate verbose logging from
FetchedCredentialsCache
(#22973) - Add
keys
column infinish_callback
(#22968) - Add
extra_columns
parameter toscan_parquet
(#22699) - Add CORR function to polars SQL (#22690)
- Add per partition sort and finish callback to sinks (#22789)
- Add and test DataFrame equality functionality (#22865)
- Support descendingly-sorted values in search_sorted() (#22825)
- Derive DSL schema (#22866)
🐞 Bug fixes
- Restrict custom
aggregate_function
inpivot
topl.element()
(#23155) - Don't leak
SourceToken
in in-memory sink linearize (#23201) - Fix panic reading empty parquet with multiple boolean columns (#23159)
- Raise ComputeError instead of panicking in
truncate
when mixing month/week/day/sub-daily units (#23176) - Materialize
list.eval
with unknown type (#23186) - Only set sorting flag for 1st column with PQ SortingColumns (#23184)
- Typo in AExprBuilder (#23171)
- Null return from var/std on scalar column (#23158)
- Support Datetime broadcast in
list.concat
(#23137) - Ensure projection pushdown maintains right table schema (#22603)
- Don't create i128 scalars if dtype-128 is not set (#23118)
- Add Null dtype support to arg_sort_by (#23107)
- Raise error by default on invalid CSV quotes (#22876)
- Fix group_by mean and median returning all nulls for Decimal dtype (#23093)
- Fix hive partition pruning not filtering out
__HIVE_DEFAULT_PARTITION__
(#23074) - Fix
AssertionError
when usingscan_delta()
on AWS withstorage_options
(#23076) - Fix deadlock on
collect(background=True)
/collect_concurrently()
(#23075) - Incorrect null count in rolling_min/max (#23073)
- Preserve
file://
in LazyFrame node traverser (#23072) - Respect column order in
register_io_source
schema (#23057) - Incorrect output when using
sort
withgroup_by
andcum_sum
(#23001) - Implement owned arithmetic for Int128 (#23055)
- Do not schema-match structs with different field counts (#23018)
- Fix confusing error message on duplicate row_index (#23043)
- Add
include_nulls
toAgg::Count
CSE check (#23032) - View buffer exceeding 2^32 - 1 bytes in concatenate_view (#23017)
- Fix incorrect
size_hint()
forFlatIter
(#23010) - Fix incorrect result selecting
pl.len()
fromscan_csv
withskip_lines
(#22949) - Allow for IO plugins with reordered columns in streaming (#22987)
- Method
str.zfill
was inconsistent with Python and pandas when string contained leading '+' (#22985) - Integer underflow in
propagate_nulls
(#22986) - Fix cum_min and cum_max does not preserve inf or -inf values at series start (#22896)
- Setting
compat_level=0
forsink_ipc
(#22960) - Support arrow Decimal32 and Decimal64 types (#22954)
- Update arrow format (#22941)
- Fix filter pushdown to IO plugins (#22910)
- Improve numeric stability rolling_mean<f32> (#22944)
- Allow subclasses in type equality checking (#22915)
- Return early in
pl.Expr.__array_ufunc__
when only single input (#22913) - Add inline implodes in type coercion (#22885)
- Correct
int_ranges
to raise error on invalid inputs (#22894) - Set the sorted flag on Array after it is sorted (#22822)
- Don't silently overflow for temporal casts (#22901)
- Fix error using
write_csv
withstorage_options
(#22881) - Schema resolution
.over(mapping_strategy="join")
with non-aggregations (#22875) - Ensure rename behaves the same as select (#22852)
📖 Documentation
- Update when_then in user guide (#23245)
- Minor improvement to
cum_count
docstring example (#23099) - Add missing entry for LazyFrame
__getitem__
(#22924)
📦 Build system
- Actually disable
ir_serde
by default (#23046) - Add a feature flag for
serde_ignored
(#22957) - Fix warnings, update DSL version and schema hash (#22953)
🛠️ Other improvements
- Update Rust Polars versions (#23229)
- Change flake to use venv (#23219)
- Add
default_alloc
feature topy-polars
(#23202) - Added more descriptive error message by replacing
FixedSizeList
withArray
(#23168) - Connect Python
assert_series_equal()
to Rust back-end (#23141) - Refactor skip_batches to use AExprBuilder (#23147)
- Use
ir_serde
instead ofserde
forIRFunctionExpr
(#23148) - Separate
FunctionExpr
andIRFunctionExpr
(#23140) - Improve Series equality functionality and prepare for Python integration (#23136)
- Add PolarsPhysicalType and use it to dispatch into_series (#23080)
- Remove
AExpr::Alias
(#23070) - Add components for Iceberg deletion file support (#23059)
- Feature gate
StructFunction::JsonEncode
(#23060) - Propagate iceberg position delete information to IR (#23045)
- Add environment variable to get Parquet decoding metrics (#23052)
- Turn
pl.cumulative_eval
into its ownAExpr
(#22994) - Add make test-streaming (#23044)
- Move scan parameter parsing for parquet to reusable function (#23019)
- Use a ref-counted
UniqueId
instead ofusize
forcache_id
(#22984) - Implement
Hash
and useSpecialEq
forRenameAliasFn
(#22989) - Turn
list.eval
into anAExpr
(#22911) - Only check for unknown DSL fields if minor is higher (#22970)
- Don't enable
ir_serde
together withserde
(#22969) - Make dtype field on Logical non-optional (#22966)
- Add new (Frozen)Categories and CategoricalMapping (#22956)
- Add a CI check for DSL schema changes (#22898)
- Add schema parameters to
expr.meta
(#22906) - Update rust toolchain in nix flake (#22905)
- Update toolchain (#22859)
Thank you to all our contributors for making this release possible!
@Athsus, @DahaoALG, @FabianWolff, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @SanjitBasker, @alexander-beedie, @borchero, @bschoenmaeckers, @cmdlineluser, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @florian-klein, @i1oveMyse1f, @ion-elgreco, @itamarst, @kdn36, @kutal10, @math-hiyoko, @mcrumiller, @mrkn, @mroeschke, @nameexhaustion, @nikaltipar, @orlp, @paskhaver, @ritchie46, @stijnherfst, @thomasfrederikhoeck and @zyctree
Python Polars 1.31.0
💥 Breaking changes
- Remove old streaming engine (#23103)
⚠️ Deprecations
- Deprecate
allow_missing_columns
inscan_parquet
in favor ofmissing_columns
(#22784)
🚀 Performance improvements
- Improve streaming groupby CSE (#23092)
- Move row index materialization in post-apply to occur after slicing (#22995)
- Add first_(true|false)_idx to BooleanChunked and use in bool arg_(min|max) (#22907)
- Don't go through row encoding for most types on
index_of
(#22903) - Optimise low-level
null
scans andarg_max
for bools (when chunked) (#22897) - Optimize multiscan performance (#22886)
✨ Enhancements
- DataType expressions in Python (#23167)
- Native implementation for Iceberg positional deletes (#23091)
- Remove old streaming engine (#23103)
- Basic implementation of
DataTypeExpr
in Rust DSL (#23049) - Add
required: bool
toParquetFieldOverwrites
(#23013) - Support serializing
name.map_fields
(#22997) - Support serializing
Expr::RenameAlias
(#22988) - Remove duplicate verbose logging from
FetchedCredentialsCache
(#22973) - Add
keys
column infinish_callback
(#22968) - Add
extra_columns
parameter toscan_parquet
(#22699) - Add CORR function to polars SQL (#22690)
- Add per partition sort and finish callback to sinks (#22789)
- Support descendingly-sorted values in search_sorted() (#22825)
- Derive DSL schema (#22866)
🐞 Bug fixes
- Remove axis in
show_graph
(#23218) - Remove axis ticks in
show_graph
(#23210) - Restrict custom
aggregate_function
inpivot
topl.element()
(#23155) - Don't leak
SourceToken
in in-memory sink linearize (#23201) - Fix panic reading empty parquet with multiple boolean columns (#23159)
- Raise ComputeError instead of panicking in
truncate
when mixing month/week/day/sub-daily units (#23176) - Materialize
list.eval
with unknown type (#23186) - Only set sorting flag for 1st column with PQ SortingColumns (#23184)
- Typo in AExprBuilder (#23171)
- Null return from var/std on scalar column (#23158)
- Support Datetime broadcast in
list.concat
(#23137) - Ensure projection pushdown maintains right table schema (#22603)
- Add Null dtype support to arg_sort_by (#23107)
- Raise error by default on invalid CSV quotes (#22876)
- Fix group_by mean and median returning all nulls for Decimal dtype (#23093)
- Fix hive partition pruning not filtering out
__HIVE_DEFAULT_PARTITION__
(#23074) - Fix
AssertionError
when usingscan_delta()
on AWS withstorage_options
(#23076) - Fix deadlock on
collect(background=True)
/collect_concurrently()
(#23075) - Incorrect null count in rolling_min/max (#23073)
- Preserve
file://
in LazyFrame node traverser (#23072) - Respect column order in
register_io_source
schema (#23057) - Don't call unnest for objects implementing
__arrow_c_array__
(#23069) - Incorrect output when using
sort
withgroup_by
andcum_sum
(#23001) - Implement owned arithmetic for Int128 (#23055)
- Do not schema-match structs with different field counts (#23018)
- Fix confusing error message on duplicate row_index (#23043)
- Add
include_nulls
toAgg::Count
CSE check (#23032) - View buffer exceeding 2^32 - 1 bytes in concatenate_view (#23017)
- Fix incorrect result selecting
pl.len()
fromscan_csv
withskip_lines
(#22949) - Allow for IO plugins with reordered columns in streaming (#22987)
- Method
str.zfill
was inconsistent with Python and pandas when string contained leading '+' (#22985) - Integer underflow in
propagate_nulls
(#22986) - Setting
compat_level=0
forsink_ipc
(#22960) - Narrow return type for
DataType.is_
, improve Pyright's type completeness from 69% to 95% (#22962) - Support arrow Decimal32 and Decimal64 types (#22954)
- Guard against dictionaries being passed to projection keywords (#22928)
- Update arrow format (#22941)
- Fix filter pushdown to IO plugins (#22910)
- Improve numeric stability rolling_mean<f32> (#22944)
- Guard against invalid nested objects in 'map_elements' (#22932)
- Allow subclasses in type equality checking (#22915)
- Return early in
pl.Expr.__array_ufunc__
when only single input (#22913) - Add inline implodes in type coercion (#22885)
- Add {top, bottom}_k_by to Series (#22902)
- Correct
int_ranges
to raise error on invalid inputs (#22894) - Don't silently overflow for temporal casts (#22901)
- Fix error using
write_csv
withstorage_options
(#22881) - Schema resolution
.over(mapping_strategy="join")
with non-aggregations (#22875) - Ensure rename behaves the same as select (#22852)
📖 Documentation
- Document aggregations that return identity when there's no non-null values, suggest workaround for those who want SQL-standard behaviour (#23143)
- Fix reference to non-existent
Expr.replace_all
inreplace_strict
docs (#23144) - Fix typo on pandas comparison page (#23123)
- Minor improvement to
cum_count
docstring example (#23099) - Add missing
DataFrame.__setitem__
to API reference (#22938) - Add missing entry for LazyFrame
__getitem__
(#22924) - Add missing
top_k_by
andbottom_k_by
toSeries
reference (#22917)
📦 Build system
- Update
pyo3
andnumpy
crates to version0.25
(#22763) - Actually disable
ir_serde
by default (#23046) - Add a feature flag for
serde_ignored
(#22957) - Fix warnings, update DSL version and schema hash (#22953)
🛠️ Other improvements
- Change flake to use venv (#23219)
- Add
default_alloc
feature topy-polars
(#23202) - Added more descriptive error message by replacing
FixedSizeList
withArray
(#23168) - Connect Python
assert_series_equal()
to Rust back-end (#23141) - Refactor skip_batches to use AExprBuilder (#23147)
- Use
ir_serde
instead ofserde
forIRFunctionExpr
(#23148) - Separate
FunctionExpr
andIRFunctionExpr
(#23140) - Remove
AExpr::Alias
(#23070) - Add components for Iceberg deletion file support (#23059)
- Feature gate
StructFunction::JsonEncode
(#23060) - Propagate iceberg position delete information to IR (#23045)
- Add environment variable to get Parquet decoding metrics (#23052)
- Turn
pl.cumulative_eval
into its ownAExpr
(#22994) - Add make test-streaming (#23044)
- Move scan parameter parsing for parquet to reusable function (#23019)
- Prepare deltalake 1.0 (#22931)
- Implement
Hash
and useSpecialEq
forRenameAliasFn
(#22989) - Turn
list.eval
into anAExpr
(#22911) - Fix CI for latest pandas-stubs release (#22971)
- Add a CI check for DSL schema changes (#22898)
- Add schema parameters to
expr.meta
(#22906) - Update rust toolchain in nix flake (#22905)
- Update toolchain (#22859)
Thank you to all our contributors for making this release possible!
@Athsus, @DahaoALG, @FabianWolff, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @SanjitBasker, @alexander-beedie, @bschoenmaeckers, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @florian-klein, @i1oveMyse1f, @ion-elgreco, @itamarst, @kdn36, @kutal10, @mcrumiller, @mroeschke, @nameexhaustion, @nikaltipar, @orlp, @paskhaver, @ritchie46, @stijnherfst and @thomasfrederikhoeck
Python Polars 1.31.0-beta.1
💥 Breaking changes
- Remove old streaming engine (#23103)
⚠️ Deprecations
- Deprecate
allow_missing_columns
inscan_parquet
in favor ofmissing_columns
(#22784)
🚀 Performance improvements
- Improve streaming groupby CSE (#23092)
- Move row index materialization in post-apply to occur after slicing (#22995)
- Add first_(true|false)_idx to BooleanChunked and use in bool arg_(min|max) (#22907)
- Don't go through row encoding for most types on
index_of
(#22903) - Optimise low-level
null
scans andarg_max
for bools (when chunked) (#22897) - Optimize multiscan performance (#22886)
✨ Enhancements
- DataType expressions in Python (#23167)
- Native implementation for Iceberg positional deletes (#23091)
- Remove old streaming engine (#23103)
- Basic implementation of
DataTypeExpr
in Rust DSL (#23049) - Add
required: bool
toParquetFieldOverwrites
(#23013) - Support serializing
name.map_fields
(#22997) - Support serializing
Expr::RenameAlias
(#22988) - Remove duplicate verbose logging from
FetchedCredentialsCache
(#22973) - Add
keys
column infinish_callback
(#22968) - Add
extra_columns
parameter toscan_parquet
(#22699) - Add CORR function to polars SQL (#22690)
- Add per partition sort and finish callback to sinks (#22789)
- Support descendingly-sorted values in search_sorted() (#22825)
- Derive DSL schema (#22866)
🐞 Bug fixes
- Fix panic reading empty parquet with multiple boolean columns (#23159)
- Raise ComputeError instead of panicking in
truncate
when mixing month/week/day/sub-daily units (#23176) - Materialize
list.eval
with unknown type (#23186) - Only set sorting flag for 1st column with PQ SortingColumns (#23184)
- Typo in AExprBuilder (#23171)
- Null return from var/std on scalar column (#23158)
- Support Datetime broadcast in
list.concat
(#23137) - Ensure projection pushdown maintains right table schema (#22603)
- Add Null dtype support to arg_sort_by (#23107)
- Raise error by default on invalid CSV quotes (#22876)
- Fix group_by mean and median returning all nulls for Decimal dtype (#23093)
- Fix hive partition pruning not filtering out
__HIVE_DEFAULT_PARTITION__
(#23074) - Fix
AssertionError
when usingscan_delta()
on AWS withstorage_options
(#23076) - Fix deadlock on
collect(background=True)
/collect_concurrently()
(#23075) - Incorrect null count in rolling_min/max (#23073)
- Preserve
file://
in LazyFrame node traverser (#23072) - Respect column order in
register_io_source
schema (#23057) - Don't call unnest for objects implementing
__arrow_c_array__
(#23069) - Incorrect output when using
sort
withgroup_by
andcum_sum
(#23001) - Implement owned arithmetic for Int128 (#23055)
- Do not schema-match structs with different field counts (#23018)
- Fix confusing error message on duplicate row_index (#23043)
- Add
include_nulls
toAgg::Count
CSE check (#23032) - View buffer exceeding 2^32 - 1 bytes in concatenate_view (#23017)
- Fix incorrect result selecting
pl.len()
fromscan_csv
withskip_lines
(#22949) - Allow for IO plugins with reordered columns in streaming (#22987)
- Method
str.zfill
was inconsistent with Python and pandas when string contained leading '+' (#22985) - Integer underflow in
propagate_nulls
(#22986) - Setting
compat_level=0
forsink_ipc
(#22960) - Narrow return type for
DataType.is_
, improve Pyright's type completeness from 69% to 95% (#22962) - Support arrow Decimal32 and Decimal64 types (#22954)
- Guard against dictionaries being passed to projection keywords (#22928)
- Update arrow format (#22941)
- Fix filter pushdown to IO plugins (#22910)
- Improve numeric stability rolling_mean<f32> (#22944)
- Guard against invalid nested objects in 'map_elements' (#22932)
- Allow subclasses in type equality checking (#22915)
- Return early in
pl.Expr.__array_ufunc__
when only single input (#22913) - Add inline implodes in type coercion (#22885)
- Add {top, bottom}_k_by to Series (#22902)
- Correct
int_ranges
to raise error on invalid inputs (#22894) - Don't silently overflow for temporal casts (#22901)
- Fix error using
write_csv
withstorage_options
(#22881) - Schema resolution
.over(mapping_strategy="join")
with non-aggregations (#22875) - Ensure rename behaves the same as select (#22852)
📖 Documentation
- Document aggregations that return identity when there's no non-null values, suggest workaround for those who want SQL-standard behaviour (#23143)
- Fix reference to non-existent
Expr.replace_all
inreplace_strict
docs (#23144) - Fix typo on pandas comparison page (#23123)
- Minor improvement to
cum_count
docstring example (#23099) - Add missing
DataFrame.__setitem__
to API reference (#22938) - Add missing entry for LazyFrame
__getitem__
(#22924) - Add missing
top_k_by
andbottom_k_by
toSeries
reference (#22917)
📦 Build system
- Update
pyo3
andnumpy
crates to version0.25
(#22763) - Actually disable
ir_serde
by default (#23046) - Add a feature flag for
serde_ignored
(#22957) - Fix warnings, update DSL version and schema hash (#22953)
🛠️ Other improvements
- Added more descriptive error message by replacing
FixedSizeList
withArray
(#23168) - Connect Python
assert_series_equal()
to Rust back-end (#23141) - Refactor skip_batches to use AExprBuilder (#23147)
- Use
ir_serde
instead ofserde
forIRFunctionExpr
(#23148) - Separate
FunctionExpr
andIRFunctionExpr
(#23140) - Remove
AExpr::Alias
(#23070) - Add components for Iceberg deletion file support (#23059)
- Feature gate
StructFunction::JsonEncode
(#23060) - Propagate iceberg position delete information to IR (#23045)
- Add environment variable to get Parquet decoding metrics (#23052)
- Turn
pl.cumulative_eval
into its ownAExpr
(#22994) - Add make test-streaming (#23044)
- Move scan parameter parsing for parquet to reusable function (#23019)
- Prepare deltalake 1.0 (#22931)
- Implement
Hash
and useSpecialEq
forRenameAliasFn
(#22989) - Turn
list.eval
into anAExpr
(#22911) - Fix CI for latest pandas-stubs release (#22971)
- Add a CI check for DSL schema changes (#22898)
- Add schema parameters to
expr.meta
(#22906) - Update rust toolchain in nix flake (#22905)
- Update toolchain (#22859)
Thank you to all our contributors for making this release possible!
@Athsus, @DahaoALG, @FabianWolff, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @SanjitBasker, @alexander-beedie, @bschoenmaeckers, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @florian-klein, @i1oveMyse1f, @ion-elgreco, @itamarst, @kdn36, @kutal10, @mroeschke, @nameexhaustion, @nikaltipar, @orlp, @paskhaver, @ritchie46, @stijnherfst and @thomasfrederikhoeck
Rust Polars 0.48.1
🚀 Performance improvements
- Switch eligible casts to non-strict in optimizer (#22850)
🐞 Bug fixes
- Fix RuntimeError when serializing the same DataFrame from multiple threads (#22844)
📦 Build system
🛠️ Other improvements
- Update Rust Polars versions (#22854)
Thank you to all our contributors for making this release possible!
@JakubValtar, @bschoenmaeckers, @nameexhaustion and @stijnherfst
Python Polars 1.30.0
🚀 Performance improvements
- Switch eligible casts to non-strict in optimizer (#22850)
- Allow predicate passing set_sorted (#22797)
- Increase default cross-file parallelism limit for new-streaming multiscan (#22700)
- Add elementwise execution mode for
list.eval
(#22715) - Support optimised init from non-dict
Mapping
objects infrom_records
and frame/series constructors (#22638) - Add streaming cross-join node (#22581)
- Switch off
maintain_order
in group-by followed by sort (#22492)
✨ Enhancements
- Load AWS
endpoint_url
using boto3 (#22851) - Implemented
list.filter
(#22749) - Support binaryoffset in search sorted (#22786)
- Add
nulls_equal
flag tolist/arr.contains
(#22773) - Implement
LazyFrame.match_to_schema
(#22726) - Improved time-string parsing and inference (generally, and via the SQL interface) (#22606)
- Allow for
.over
to be called withoutpartition_by
(#22712) - Support
AnyValue
translation fromPyMapping
values (#22722) - Support optimised init from non-dict
Mapping
objects infrom_records
and frame/series constructors (#22638) - Support inference of
Int128
dtype from databases that support it (#22682) - Add options to write Parquet field metadata (#22652)
- Add
cast_options
parameter to control type casting inscan_parquet
(#22617) - Allow casting
List<UInt8>
toBinary
(#22611) - Allow setting of regex size limit using
POLARS_REGEX_SIZE_LIMIT
(#22651) - Support use of literal values as "other" when evaluating
Series.zip_with
(#22632) - Allow to read and write custom file-level parquet metadata (#21806)
- Support PEP702
@deprecated
decorator behaviour (#22594) - Support grouping by
pl.Array
(#22575) - Preserve exception type and traceback for errors raised from Python (#22561)
- Use fixed-width font in streaming phys plan graph (#22540)
🐞 Bug fixes
- Fix RuntimeError when serializing the same DataFrame from multiple threads (#22844)
- Fix map_elements predicate pushdown (#22833)
- Fix reverse list type (#22832)
- Don't require numpy for search_sorted (#22817)
- Add type equality checking for relevant methods (#22802)
- Invalid output for
fill_null
afterwhen.then
on structs (#22798) - Don't panic for cross join with misaligned chunking (#22799)
- Panic on quantile over nulls in rolling window (#22792)
- Respect BinaryOffset metadata (#22785)
- Correct the output order of
PartitionByKey
andPartitionParted
(#22778) - Fallback to non-strict casting for deprecated casts (#22760)
- Clippy on new stable version (#22771)
- Handle sliced out remainder for bitmaps (#22759)
- Don't merge
Enum
categories on append (#22765) - Fix unnest() not working on empty struct columns (#22391)
- Fix the default value type in
Schema
init (#22589) - Correct name in
unnest
error message (#22740) - Provide "schema" to
DataFrame
, even if empty JSON (#22739) - Properly account for nulls in the
is_not_nan
check made indrop_nans
(#22707) - Incorrect result from SQL
count(*)
withpartition by
(#22728) - Fix deadlock joining scanned tables with low thread count (#22672)
- Don't allow deserializing incompatible DSL (#22644)
- Incorrect null dtype from binary ops in empty group_by (#22721)
- Don't mark
str.replace_many
with Mapping as deprecated (#22697) - Gzip has maximum compression of 9, not 10 (#22685)
- Fix predicate pushdown of fallible expressions (#22669)
- Fix
index out of bounds
panic when scanning hugging face (#22661) - Panic on
group_by
with literal and empty rows (#22621) - Return input instead of panicking if empty subset in
drop_nulls()
anddrop_nans()
(#22469) - Bump argminmax to 0.6.3 (#22649)
- DSL version deserialization endianness (#22642)
- Allow Expr.round() to be called on integer dtypes (#22622)
- Fix panic when filtering based on row index column in parquet (#22616)
- WASM and PyOdide compile (#22613)
- Resolve
get()
SchemaMismatch panic (#22350) - Panic in group_by_dynamic on single-row df with group_by (#22597)
- Add
new_streaming
feature topolars
crate (#22601) - Consistently use Unix epoch as origin for
dt.truncate
(except weekly buckets which start on Mondays) (#22592) - Fix interpolate on dtype Decimal (#22541)
- CSV count rows skipped last line if file did not end with newline (#22577)
- Make nested strict casting actually strict (#22497)
- Make
replace
andreplace_strict
mapping use list literals (#22566) - Allow pivot on
Time
column (#22550) - Fix error when providing CSV schema with extra columns (#22544)
- Panic on bitwise op between Series and Expr (#22527)
- Multi-selector regex expansion (#22542)
📖 Documentation
- Add pre-release policy (#22808)
- Fix broken link to service account page in Polars Cloud docs (#22762)
- Add
match_to_schema
to API reference (#22777) - Provide additional explanation and examples for the
value_counts
"normalize" parameter (#22756) - Rework documentation for
drop
/fill
for nulls/nans (#22657) - Add documentation to new
RoundMode
parameter inround
(#22555) - Add missing
repeat_by
to API reference, fixuplist.get
(#22698) - Fix non-rendering bullet points in
scan_iceberg
(#22694) - Improve
insert_column
docstring (description and examples) (#22551) - Improve
join
documentation (#22556)
📦 Build system
- Fix building
polars-lazy
with certain features (#22846) - Add missing features (#22839)
- Patch pyo3 to disable recompilation (#22796)
🛠️ Other improvements
- Update Rust Polars versions (#22854)
- Add basic smoke test for free-threaded python (#22481)
- Update Polars Rust versions (#22834)
- Fix
nix build
(#22809) - Fix flake.nix to work on macos (#22803)
- Unused variables on release build (#22800)
- Update cloud docs (#22624)
- Fix unstable
list.eval
performance test (#22729) - Add proptest implementations for all Array types (#22711)
- Dispatch
.write_*
to.lazy().sink_*(engine='in-memory')
(#22582) - Move to all optimization flags to
QueryOptFlags
(#22680) - Add test for
str.replace_many
(#22615) - Stabilize
sink_*
(#22643) - Add proptest for row-encode (#22626)
- Update rust version in nix flake (#22627)
- Add a nix flake with a devShell and package (#22246)
- Use a wrapper struct to store time zone (#22523)
- Add
proptest
testing for for parquet decoding kernels (#22608) - Include equiprobable as valid quantile method (#22571)
- Remove confusing error context calling
.collect(_eager=True)
(#22602) - Fix test_truncate_path test case (#22598)
- Unify function flags into 1 bitset (#22573)
- Display the operation behind
in-memory-map
(#22552)
Thank you to all our contributors for making this release possible!
@IvanIsCoding, @JakubValtar, @Julian-J-S, @LucioFranco, @MarcoGorelli, @WH-2099, @alexander-beedie, @borchero, @bschoenmaeckers, @cmdlineluser, @coastalwhite, @etiennebacher, @florian-klein, @itamarst, @kdn36, @mcrumiller, @nameexhaustion, @nikaltipar, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @teotwaki, @timkpaine and @wence-
Rust Polars 0.48.0
💥 Breaking changes
- Use a wrapper struct to store time zone (#22523)
🚀 Performance improvements
- Allow predicate passing set_sorted (#22797)
- Increase default cross-file parallelism limit for new-streaming multiscan (#22700)
- Add elementwise execution mode for
list.eval
(#22715) - Support optimised init from non-dict
Mapping
objects infrom_records
and frame/series constructors (#22638) - Add streaming cross-join node (#22581)
- Switch off
maintain_order
in group-by followed by sort (#22492)
✨ Enhancements
- Format named functions (#22831)
- Implemented
list.filter
(#22749) - Support binaryoffset in search sorted (#22786)
- Add
nulls_equal
flag tolist/arr.contains
(#22773) - Allow named opaque functions for serde (#22734)
- Implement
LazyFrame.match_to_schema
(#22726) - Improved time-string parsing and inference (generally, and via the SQL interface) (#22606)
- Allow for
.over
to be called withoutpartition_by
(#22712) - Support
AnyValue
translation fromPyMapping
values (#22722) - Support optimised init from non-dict
Mapping
objects infrom_records
and frame/series constructors (#22638) - Add options to write Parquet field metadata (#22652)
- Allow casting
List<UInt8>
toBinary
(#22611) - Allow setting of regex size limit using
POLARS_REGEX_SIZE_LIMIT
(#22651)
🐞 Bug fixes
- Fix reverse list type (#22832)
- Add type equality checking for relevant methods (#22802)
- Invalid output for
fill_null
afterwhen.then
on structs (#22798) - Don't panic for cross join with misaligned chunking (#22799)
- Panic on quantile over nulls in rolling window (#22792)
- Respect BinaryOffset metadata (#22785)
- Correct the output order of
PartitionByKey
andPartitionParted
(#22778) - Fallback to non-strict casting for deprecated casts (#22760)
- Clippy on new stable version (#22771)
- Handle sliced out remainder for bitmaps (#22759)
- Don't merge
Enum
categories on append (#22765) - Fix unnest() not working on empty struct columns (#22391)
- Correct name in
unnest
error message (#22740) - Properly account for nulls in the
is_not_nan
check made indrop_nans
(#22707) - Incorrect result from SQL
count(*)
withpartition by
(#22728) - Fix deadlock joining scanned tables with low thread count (#22672)
- Don't allow deserializing incompatible DSL (#22644)
- Incorrect null dtype from binary ops in empty group_by (#22721)
- Don't mark
str.replace_many
with Mapping as deprecated (#22697) - Gzip has maximum compression of 9, not 10 (#22685)
- Fix predicate pushdown of fallible expressions (#22669)
- Fix
index out of bounds
panic when scanning hugging face (#22661) - Fix polars crate not compiling when lazy feature enabled (#22655)
- Panic on
group_by
with literal and empty rows (#22621) - Return input instead of panicking if empty subset in
drop_nulls()
anddrop_nans()
(#22469) - Bump argminmax to 0.6.3 (#22649)
- DSL version deserialization endianness (#22642)
- Fix nested dtype row encoding (#22557)
- Allow Expr.round() to be called on integer dtypes (#22622)
- Fix panic when filtering based on row index column in parquet (#22616)
- WASM and PyOdide compile (#22613)
- Resolve
get()
SchemaMismatch panic (#22350)
📖 Documentation
- Add pre-release policy (#22808)
- Fix broken link to service account page in Polars Cloud docs (#22762)
- Rework documentation for
drop
/fill
for nulls/nans (#22657)
📦 Build system
- Patch pyo3 to disable recompilation (#22796)
🛠️ Other improvements
- Update Polars Rust versions (#22834)
- Cleanup
polars-python
lifetimes (#22548) - Fix
nix build
(#22809) - Fix flake.nix to work on macos (#22803)
- Remove unused dependencies in
polars-arrow
(#22806) - Unused variables on release build (#22800)
- Update cloud docs (#22624)
- Add proptest implementations for all Array types (#22711)
- Dispatch
.write_*
to.lazy().sink_*(engine='in-memory')
(#22582) - Move to all optimization flags to
QueryOptFlags
(#22680) - Add test for
str.replace_many
(#22615) - Stabilize
sink_*
(#22643) - Add proptest for row-encode (#22626)
- Emphasize PolarsDataType::get_dtype is static-only (#22648)
- Use named fields for Logical (#22647)
- Update rust version in nix flake (#22627)
- Add a nix flake with a devShell and package (#22246)
- Use a wrapper struct to store time zone (#22523)
- Add
proptest
testing for for parquet decoding kernels (#22608)
Thank you to all our contributors for making this release possible!
@IvanIsCoding, @JakubValtar, @Julian-J-S, @LucioFranco, @MarcoGorelli, @WH-2099, @alexander-beedie, @borchero, @bschoenmaeckers, @cmdlineluser, @coastalwhite, @etiennebacher, @florian-klein, @itamarst, @kdn36, @nameexhaustion, @nikaltipar, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @teotwaki, @timkpaine and @wence-