Releases · pola-rs/polars

01 Aug 12:19

github-actions

rs-0.50.0

0478b35

Rust Polars 0.50.0 Latest

Latest

🏆 Highlights

Make Selector a concrete part of the DSL (#23351)
Rework Categorical/Enum to use (Frozen)Categories (#23016)

🚀 Performance improvements

Lower Expr.slice to streaming engine (#23683)
Elide bound check (#23653)
Preserve Column repr in ColumnTransform operations (#23648)
Lower any() and all() to streaming engine (#23640)
Lower row-separable functions in streaming engine (#23633)
Lower int_range(len()) to with_row_index (#23576)
Avoid double field resolution in with_columns (#23530)
Rolling quantile lower time complexity (#23443)
Use single-key optimization with Categorical (#23436)
Improve null-preserving identification for boolean functions (#23317)
Improve boolean bitwise aggregate performance (#23325)
Enable Parquet expressions and dedup is_in values in Parquet predicates (#23293)
Re-write join types during filter pushdown (#23275)
Generate PQ ZSTD decompression context once (#23200)
Trigger cache/cse optimizations when multiplexing (#23274)
Cache FileInfo upon DSL -> IR conversion (#23263)
Push more filters past joins (#23240)

✨ Enhancements

Expand on DataTypeExpr (#23249)
Lower row-separable functions in streaming engine (#23633)
Add scalar checks to range expressions (#23632)
Expose POLARS_DOT_SVG_VIEWER to automatically dispatch to SVG viewer (#23592)
Implement mean function in arr namespace (#23486)
Implement vec_hash for List and Array (#23578)
Add unstable pl.row_index() expression (#23556)
Add Categories on the Python side (#23543)
Implement partitioned sinks for the in-memory engine (#23522)
Expose IRFunctionExpr::Rank in the python visitor (#23512)
Raise and Warn on UDF's without return_dtype set (#23353)
IR pruning (#23499)
Expose IRFunctionExpr::FillNullWithStrategy in the python visitor (#23479)
Support min/max reducer for null dtype in streaming engine (#23465)
Implement streaming Categorical/Enum min/max (#23440)
Allow cast to Categorical inside list.eval (#23432)
Support pathlib.Path as source for read/scan_delta() (#23411)
Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
Pass payload in ExprRegistry (#23412)
Support reading nanosecond/Int96 timestamps and schema evolved datasets in scan_delta() (#23398)
Support row group skipping with filters when cast_options is given (#23356)
Execute bitwise reductions in streaming engine (#23321)
Use scan_parquet().collect_schema() for read_parquet_schema (#23359)
Add dtype to str.to_integer() (#22239)
Add arr.slice, arr.head and arr.tail methods to arr namespace (#23150)
Add is_close method (#23273)
Drop superfluous casts from optimized plan (#23269)
Added drop_nulls option to to_dummies (#23215)
Support comma as decimal separator for CSV write (#23238)
Don't format keys if they're empty in dot (#23247)
Improve arity simplification (#23242)

🐞 Bug fixes

Fix credential refresh logic (#23730)
Fix to_datetime() fallible identification (#23735)
Correct output datatype for dt.with_time_unit (#23734)
Fix incorrect native Iceberg scan from tables with renamed/dropped columns/fields (#23713)
Allow DataType expressions with selectors (#23720)
Match output type to engine for interpolate on Decimal (#23706)
Remaining bugs in with_exprs_and_input and pruning (#23710)
Match output dtype to engine for cum_sum_horizontal (#23686)
Field names for pl.struct in group-by (#23703)
Fix output for str.extract_groups with empty string pattern (#23698)
Match output type to engine for rolling_map (#23702)
Fix incorrect join on single Int128 column for in-memory engine (#23694)
Match output field name to lhs for BusinessDaycount (#23679)
Correct the planner output datatype for strptime (#23676)
Sort and Scan with_exprs_and_input (#23675)
Revert to old behavior with name.keep (#23670)
Fix panic loading from arrow Map containing timestamps (#23662)
Selectors in self part of list.eval (#23668)
Fix output field dtype for ToInteger (#23664)
Allow decimal_comma with , separator in read_csv (#23657)
Fix handling of UTF-8 in write_csv to IO[str] (#23647)
Selectors in {Lazy,Data}Frame.filter (#23631)
Stop splitfields iterator at eol in simd branch (#23652)
Correct output datatype of dt.year and dt.mil (#23646)
Logic of broadcast_rhs in binary functions to correct list.set_intersection for list[str] columns (#23584)
Order-preserving equi-join didn't always flush final matches (#23639)
Fix ColumnNotFound error when joining on col().cast() (#23622)
Fix agg groups on when/then in group_by context (#23628)
Output type for sign (#23572)
Apply agg_fn on null values in pivot (#23586)
Remove nonsensical duration variance (#23621)
Don't panic when sinking nested categorical to Parquet (#23610)
Correctly set value count output field name (#23611)
Casting unused columns in to_torch (#23606)
Allow inferring of hours-only timezone offset (#23605)
Bug in Categorical <-> str compare with nulls (#23609)
Honor n=0 in all cases of str.replace (#23598)
Remove arbitrary 25 item limit from implicit Python list -> Series infer (#23603)
Relabel duplicate sequence IDs in distributor (#23593)
Round-trip Enum and Categorical metadata in plugins (#23588)
Fix incorrect join_asof with by followed by head/slice (#23585)
Allow writing nested Int128 data to Parquet (#23580)
Enum serialization assert (#23574)
Output type for peak_min / peak_max (#23573)
Make Scalar Categorical, Enum and Struct values serializable (#23565)
Preserve row order within partition when sinking parquet (#23462)
Panic in create_multiple_physical_plans when branching from a single cache node (#23561)
Prevent in-mem partition sink deadlock (#23562)
Update AWS cloud documentation (#23563)
Correctly handle null values when comparing structs (#23560)
Make fold/reduce/cum_reduce/cum_fold serializable (#23524)
Make Expr.append serializable (#23515)
Float by float division dtype (#23529)
Division on empty DataFrame generating null row (#23516)
Partition sink copy_exprs and with_exprs_and_input (#23511)
Unreachable with pl.self_dtype (#23507)
Rolling median incorrect min_samples with nulls (#23481)
Make Int128 roundtrippable via Parquet (#23494)
Fix panic when common subplans contain IEJoins (#23487)
Properly handle non-finite floats in rolling_sum/mean (#23482)
Make read_csv_batched respect skip_rows and skip_lines (#23484)
Always use cloudpickle for the python objects in cloud plans (#23474)
Support string literals in index_of() on categoricals (#23458)
Don't panic for finish_callback with nested datatypes (#23464)
Support min/max aggregation for DataFrame/LazyFrame Categoricals (#23455)
Fix var/moment dtypes (#23453)
Fix agg_groups dtype (#23450)
Clear cached_schema when apply changes dtype (#23439)
Allow structured conversion to/from numpy with Array types, preserving shape (#23438)
Null handling in full-null group_by_dynamic mean/sum (#23435)
Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
Fix index calculation for nearest interpolation (#23418)
Fix compilation failure with --no-default-features and --features lazy,strings (#23384)
Parse parquet footer length into unsigned integer (#23357)
Fix incorrect results with group_by aggregation on empty groups (#23358)
Fix boolean min() in group_by aggregation (streaming) (#23344)
Respect data-model in map_elements (#23340)
Properly join URI paths in PlPath (#23350)
Ignore null values in bitwise aggregation on bools (#23324)
Fix panic filtering after left join (#23310)
Out-of-bounds index in hot hash table (#23311)
Fix scanning '?' from cloud with glob=False (#23304)
Fix filters on inserted columns did not remove rows (#23303)
Don't ignore return_dtype (#23309)
Use safe parsing for get_normal_components (#23284)
Fix output column names/order of streaming coalesced right-join (#23278)
Restore concat_arr inputs expansion (#23271)

📖 Documentation

Point the R Polars version on R-multiverse (#23660)
Update GPU docs for RAPIDS CUDA 11 deprecation (#23620)
Add page about billing to Polars Cloud user guide (#23564)
Small user-guide improvement and fixes (#23549)
Correct note in from_pandas about data being cloned (#23552)
Fix a few typos in the "Streaming" section (#23536)
Update streaming page (#23535)
Update structure of Polars Cloud documentation (#23496)
Update when_then in user guide (#23245)

📦 Build system

Update all rand code (#23387)
Bump up rand & rand_distr (#22619)

🛠️ Other improvements

Remove incorrect DeletionFilesList::slice (#23796)
Remove old schema file (#23798)
Remove Default for StreamingExecutionState (#23729)
Explicit match to smaller dtypes before cast to Int32 in asof join (#23776)
Expose PlPathRef via polars::prelude (#23754)
Add hashes json (#23758)
Add AExpr::is_expr_equal_to (#23740)
Fix rank test to respect maintain order (#23723)
IR inputs and exprs iterators (#23722)
Store more granular schema hashes to reduce merge conflicts (#23709)
Add assertions for unique ID (#23711)
Use RelaxedCell in multiscan (#23712)
Debug assert ColumnTransform cast is non-strict (#23717)
Use UUID for UniqueID (#23704)
Remove scan id (#23697)
Propagate Iceberg physical ID schema to IR (#23671)
Remove unused and confusing match arm (#23691)
Remove unused ALLOW_GROUP_AWARE flag (#23690)
Remove unused evaluate_inline (#23687)
Remove unused field from AggregationContext (#23685)
Remove `nod...

Contributors

orlp, alexander-beedie, and 28 other contributors

Assets 2

01 Aug 01:43

github-actions

py-1.32.0

c57de4b

Python Polars 1.32.0

🏆 Highlights

Make Selector a concrete part of the DSL (#23351)
Rework Categorical/Enum to use (Frozen)Categories (#23016)

🚀 Performance improvements

Lower Expr.slice to streaming engine (#23683)
Elide bound check (#23653)
Preserve Column repr in ColumnTransform operations (#23648)
Lower any() and all() to streaming engine (#23640)
Lower row-separable functions in streaming engine (#23633)
Lower int_range(len()) to with_row_index (#23576)
Avoid double field resolution in with_columns (#23530)
Rolling quantile lower time complexity (#23443)
Use single-key optimization with Categorical (#23436)
Improve null-preserving identification for boolean functions (#23317)
Improve boolean bitwise aggregate performance (#23325)
Enable Parquet expressions and dedup is_in values in Parquet predicates (#23293)
Re-write join types during filter pushdown (#23275)
Generate PQ ZSTD decompression context once (#23200)
Trigger cache/cse optimizations when multiplexing (#23274)
Cache FileInfo upon DSL -> IR conversion (#23263)
Push more filters past joins (#23240)
Optimize Bitmap::make_mut (#23138)

✨ Enhancements

Add Python-side caching for credentials and provider auto-initialization (#23736)
Expand on DataTypeExpr (#23249)
Lower row-separable functions in streaming engine (#23633)
Add scalar checks to range expressions (#23632)
Expose POLARS_DOT_SVG_VIEWER to automatically dispatch to SVG viewer (#23592)
Implement mean function in arr namespace (#23486)
Implement vec_hash for List and Array (#23578)
Add unstable pl.row_index() expression (#23556)
Add Categories on the Python side (#23543)
Implement partitioned sinks for the in-memory engine (#23522)
Raise and Warn on UDF's without return_dtype set (#23353)
IR pruning (#23499)
Support min/max reducer for null dtype in streaming engine (#23465)
Implement streaming Categorical/Enum min/max (#23440)
Allow cast to Categorical inside list.eval (#23432)
Support pathlib.Path as source for read/scan_delta() (#23411)
Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
Pass payload in ExprRegistry (#23412)
Support reading nanosecond/Int96 timestamps and schema evolved datasets in scan_delta() (#23398)
Support row group skipping with filters when cast_options is given (#23356)
Execute bitwise reductions in streaming engine (#23321)
Use scan_parquet().collect_schema() for read_parquet_schema (#23359)
Add dtype to str.to_integer() (#22239)
Add arr.slice, arr.head and arr.tail methods to arr namespace (#23150)
Add is_close method (#23273)
Drop superfluous casts from optimized plan (#23269)
Added drop_nulls option to to_dummies (#23215)
Support comma as decimal separator for CSV write (#23238)
Don't format keys if they're empty in dot (#23247)
Improve arity simplification (#23242)
Allow expression input for length parameter in pad_start, pad_end, and zfill (#23182)

🐞 Bug fixes

Load _expiry_time from botocore Credentials in CredentialProviderAWS (#23753)
Fix credential refresh logic (#23730)
Fix to_datetime() fallible identification (#23735)
Correct output datatype for dt.with_time_unit (#23734)
Fix incorrect native Iceberg scan from tables with renamed/dropped columns/fields (#23713)
Allow DataType expressions with selectors (#23720)
Match output type to engine for interpolate on Decimal (#23706)
Remaining bugs in with_exprs_and_input and pruning (#23710)
Match output dtype to engine for cum_sum_horizontal (#23686)
Field names for pl.struct in group-by (#23703)
Fix output for str.extract_groups with empty string pattern (#23698)
Match output type to engine for rolling_map (#23702)
Moved passing DeltaTable._storage_options (#23673)
Fix incorrect join on single Int128 column for in-memory engine (#23694)
Match output field name to lhs for BusinessDaycount (#23679)
Correct the planner output datatype for strptime (#23676)
Sort and Scan with_exprs_and_input (#23675)
Revert to old behavior with name.keep (#23670)
Fix panic loading from arrow Map containing timestamps (#23662)
Selectors in self part of list.eval (#23668)
Fix output field dtype for ToInteger (#23664)
Allow decimal_comma with , separator in read_csv (#23657)
Fix handling of UTF-8 in write_csv to IO[str] (#23647)
Selectors in {Lazy,Data}Frame.filter (#23631)
Stop splitfields iterator at eol in simd branch (#23652)
Correct output datatype of dt.year and dt.mil (#23646)
Logic of broadcast_rhs in binary functions to correct list.set_intersection for list[str] columns (#23584)
Order-preserving equi-join didn't always flush final matches (#23639)
Fix ColumnNotFound error when joining on col().cast() (#23622)
Fix agg groups on when/then in group_by context (#23628)
Output type for sign (#23572)
Apply agg_fn on null values in pivot (#23586)
Remove nonsensical duration variance (#23621)
Don't panic when sinking nested categorical to Parquet (#23610)
Correctly set value count output field name (#23611)
Casting unused columns in to_torch (#23606)
Allow inferring of hours-only timezone offset (#23605)
Bug in Categorical <-> str compare with nulls (#23609)
Honor n=0 in all cases of str.replace (#23598)
Remove arbitrary 25 item limit from implicit Python list -> Series infer (#23603)
Relabel duplicate sequence IDs in distributor (#23593)
Round-trip Enum and Categorical metadata in plugins (#23588)
Fix incorrect join_asof with by followed by head/slice (#23585)
Change return typing of get_index_type() from DataType to PolarsIntegerType (#23558)
Allow writing nested Int128 data to Parquet (#23580)
Enum serialization assert (#23574)
Output type for peak_min / peak_max (#23573)
Make Scalar Categorical, Enum and Struct values serializable (#23565)
Preserve row order within partition when sinking parquet (#23462)
Prevent in-mem partition sink deadlock (#23562)
Update AWS cloud documentation (#23563)
Correctly handle null values when comparing structs (#23560)
Make fold/reduce/cum_reduce/cum_fold serializable (#23524)
Make Expr.append serializable (#23515)
Float by float division dtype (#23529)
Division on empty DataFrame generating null row (#23516)
Partition sink copy_exprs and with_exprs_and_input (#23511)
Unreachable with pl.self_dtype (#23507)
Rolling median incorrect min_samples with nulls (#23481)
Make Int128 roundtrippable via Parquet (#23494)
Fix panic when common subplans contain IEJoins (#23487)
Properly handle non-finite floats in rolling_sum/mean (#23482)
Make read_csv_batched respect skip_rows and skip_lines (#23484)
Always use cloudpickle for the python objects in cloud plans (#23474)
Support string literals in index_of() on categoricals (#23458)
Don't panic for finish_callback with nested datatypes (#23464)
Pass DeltaTable._storage_options if no storage_options are provided (#23456)
Support min/max aggregation for DataFrame/LazyFrame Categoricals (#23455)
Fix var/moment dtypes (#23453)
Fix agg_groups dtype (#23450)
Fix incorrect _get_path_scheme (#23444)
Fix missing overload defaults in read_ods and tree_format (#23442)
Clear cached_schema when apply changes dtype (#23439)
Allow structured conversion to/from numpy with Array types, preserving shape (#23438)
Null handling in full-null group_by_dynamic mean/sum (#23435)
Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
Fix index calculation for nearest interpolation (#23418)
Overload for eager default in Schema.to_frame was False instead of True (#23413)
Fix read_excel overloads so that passing list[str] to sheet_name does not raise (#23388)
Removed special handling for bytes like objects in read_ndjson (#23361)
Parse parquet footer length into unsigned integer (#23357)
Fix incorrect results with group_by aggregation on empty groups (#23358)
Fix boolean min() in group_by aggregation (streaming) (#23344)
Respect data-model in map_elements (#23340)
Properly join URI paths in PlPath (#23350)
Ignore null values in bitwise aggregation on bools (#23324)
Fix panic filtering after left join (#23310)
Out-of-bounds index in hot hash table (#23311)
Fix scanning '?' from cloud with glob=False (#23304)
Fix filters on inserted columns did not remove rows (#23303)
Don't ignore return_dtype (#23309)
Raise error instead of return in Series class (#23301)
Use safe parsing for get_normal_components (#23284)
Fix output column names/order of streaming coalesced right-join (#23278)
Restore concat_arr inputs expansion (#23271)
Expose FieldsMapper (#23232)
Fix time zone handling in dt.iso_year and dt.is_leap_year (#23125)

📖 Documentation

Fix str.replace_many examples trigger deprecation warning (#23695)
Point the R Polars version on R-multiverse (#23660)
Update example for writing to cloud storage (#20265)
Update GPU docs for RAPIDS CUDA 11 deprecation (#23620)
Add docs of Expr.list.filter and Series.list.filter (#23589)
Add page about billing to Polars Cloud user guide (#23564)
Small user-guide improvement and fixes (#23549)
Correct note in from_pandas about data being cloned (#23552)
Fix a few typos in the "Streaming" section (#23536)
Update streaming page (#23535)
Update structure of Polars Cloud documentation (#23496)
Update example code in pandas migration guide (#23403)
Correct plugins user guide to reflect that teaching Expr.language is in a different section (#23377)
Add example of using OR in join_where (#23375)
Update when_then in user guide (#23245)

📦 Build system

Update all rand code (#23387)

🛠️ Other improvements

Remove unused functions from the rust side (#2...

Contributors

mrkn, orlp, and 29 other contributors

Assets 4

26 Jul 19:44

github-actions

py-1.32.0-beta.1

a7081b6

Python Polars 1.32.0-beta.1 Pre-release

Pre-release

🏆 Highlights

Make Selector a concrete part of the DSL (#23351)
Rework Categorical/Enum to use (Frozen)Categories (#23016)

🚀 Performance improvements

Lower Expr.slice to streaming engine (#23683)
Elide bound check (#23653)
Preserve Column repr in ColumnTransform operations (#23648)
Lower any() and all() to streaming engine (#23640)
Lower row-separable functions in streaming engine (#23633)
Lower int_range(len()) to with_row_index (#23576)
Avoid double field resolution in with_columns (#23530)
Rolling quantile lower time complexity (#23443)
Use single-key optimization with Categorical (#23436)
Improve null-preserving identification for boolean functions (#23317)
Improve boolean bitwise aggregate performance (#23325)
Enable Parquet expressions and dedup is_in values in Parquet predicates (#23293)
Re-write join types during filter pushdown (#23275)
Generate PQ ZSTD decompression context once (#23200)
Trigger cache/cse optimizations when multiplexing (#23274)
Cache FileInfo upon DSL -> IR conversion (#23263)
Push more filters past joins (#23240)
Optimize Bitmap::make_mut (#23138)

✨ Enhancements

Add Python-side caching for credentials and provider auto-initialization (#23736)
Expand on DataTypeExpr (#23249)
Lower row-separable functions in streaming engine (#23633)
Add scalar checks to range expressions (#23632)
Expose POLARS_DOT_SVG_VIEWER to automatically dispatch to SVG viewer (#23592)
Implement mean function in arr namespace (#23486)
Implement vec_hash for List and Array (#23578)
Add unstable pl.row_index() expression (#23556)
Add Categories on the Python side (#23543)
Implement partitioned sinks for the in-memory engine (#23522)
Raise and Warn on UDF's without return_dtype set (#23353)
IR pruning (#23499)
Support min/max reducer for null dtype in streaming engine (#23465)
Implement streaming Categorical/Enum min/max (#23440)
Allow cast to Categorical inside list.eval (#23432)
Support pathlib.Path as source for read/scan_delta() (#23411)
Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
Pass payload in ExprRegistry (#23412)
Support reading nanosecond/Int96 timestamps and schema evolved datasets in scan_delta() (#23398)
Support row group skipping with filters when cast_options is given (#23356)
Execute bitwise reductions in streaming engine (#23321)
Use scan_parquet().collect_schema() for read_parquet_schema (#23359)
Add dtype to str.to_integer() (#22239)
Add arr.slice, arr.head and arr.tail methods to arr namespace (#23150)
Add is_close method (#23273)
Drop superfluous casts from optimized plan (#23269)
Added drop_nulls option to to_dummies (#23215)
Support comma as decimal separator for CSV write (#23238)
Don't format keys if they're empty in dot (#23247)
Improve arity simplification (#23242)
Allow expression input for length parameter in pad_start, pad_end, and zfill (#23182)

🐞 Bug fixes

Load _expiry_time from botocore Credentials in CredentialProviderAWS (#23753)
Fix credential refresh logic (#23730)
Fix to_datetime() fallible identification (#23735)
Correct output datatype for dt.with_time_unit (#23734)
Fix incorrect native Iceberg scan from tables with renamed/dropped columns/fields (#23713)
Allow DataType expressions with selectors (#23720)
Match output type to engine for interpolate on Decimal (#23706)
Remaining bugs in with_exprs_and_input and pruning (#23710)
Match output dtype to engine for cum_sum_horizontal (#23686)
Field names for pl.struct in group-by (#23703)
Fix output for str.extract_groups with empty string pattern (#23698)
Match output type to engine for rolling_map (#23702)
Moved passing DeltaTable._storage_options (#23673)
Fix incorrect join on single Int128 column for in-memory engine (#23694)
Match output field name to lhs for BusinessDaycount (#23679)
Correct the planner output datatype for strptime (#23676)
Sort and Scan with_exprs_and_input (#23675)
Revert to old behavior with name.keep (#23670)
Fix panic loading from arrow Map containing timestamps (#23662)
Selectors in self part of list.eval (#23668)
Fix output field dtype for ToInteger (#23664)
Allow decimal_comma with , separator in read_csv (#23657)
Fix handling of UTF-8 in write_csv to IO[str] (#23647)
Selectors in {Lazy,Data}Frame.filter (#23631)
Stop splitfields iterator at eol in simd branch (#23652)
Correct output datatype of dt.year and dt.mil (#23646)
Logic of broadcast_rhs in binary functions to correct list.set_intersection for list[str] columns (#23584)
Order-preserving equi-join didn't always flush final matches (#23639)
Fix ColumnNotFound error when joining on col().cast() (#23622)
Fix agg groups on when/then in group_by context (#23628)
Output type for sign (#23572)
Apply agg_fn on null values in pivot (#23586)
Remove nonsensical duration variance (#23621)
Don't panic when sinking nested categorical to Parquet (#23610)
Correctly set value count output field name (#23611)
Casting unused columns in to_torch (#23606)
Allow inferring of hours-only timezone offset (#23605)
Bug in Categorical <-> str compare with nulls (#23609)
Honor n=0 in all cases of str.replace (#23598)
Remove arbitrary 25 item limit from implicit Python list -> Series infer (#23603)
Relabel duplicate sequence IDs in distributor (#23593)
Round-trip Enum and Categorical metadata in plugins (#23588)
Fix incorrect join_asof with by followed by head/slice (#23585)
Change return typing of get_index_type() from DataType to PolarsIntegerType (#23558)
Allow writing nested Int128 data to Parquet (#23580)
Enum serialization assert (#23574)
Output type for peak_min / peak_max (#23573)
Make Scalar Categorical, Enum and Struct values serializable (#23565)
Preserve row order within partition when sinking parquet (#23462)
Prevent in-mem partition sink deadlock (#23562)
Update AWS cloud documentation (#23563)
Correctly handle null values when comparing structs (#23560)
Make fold/reduce/cum_reduce/cum_fold serializable (#23524)
Make Expr.append serializable (#23515)
Float by float division dtype (#23529)
Division on empty DataFrame generating null row (#23516)
Partition sink copy_exprs and with_exprs_and_input (#23511)
Unreachable with pl.self_dtype (#23507)
Rolling median incorrect min_samples with nulls (#23481)
Make Int128 roundtrippable via Parquet (#23494)
Fix panic when common subplans contain IEJoins (#23487)
Properly handle non-finite floats in rolling_sum/mean (#23482)
Make read_csv_batched respect skip_rows and skip_lines (#23484)
Always use cloudpickle for the python objects in cloud plans (#23474)
Support string literals in index_of() on categoricals (#23458)
Don't panic for finish_callback with nested datatypes (#23464)
Pass DeltaTable._storage_options if no storage_options are provided (#23456)
Support min/max aggregation for DataFrame/LazyFrame Categoricals (#23455)
Fix var/moment dtypes (#23453)
Fix agg_groups dtype (#23450)
Fix incorrect _get_path_scheme (#23444)
Fix missing overload defaults in read_ods and tree_format (#23442)
Clear cached_schema when apply changes dtype (#23439)
Allow structured conversion to/from numpy with Array types, preserving shape (#23438)
Null handling in full-null group_by_dynamic mean/sum (#23435)
Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
Fix index calculation for nearest interpolation (#23418)
Overload for eager default in Schema.to_frame was False instead of True (#23413)
Fix read_excel overloads so that passing list[str] to sheet_name does not raise (#23388)
Removed special handling for bytes like objects in read_ndjson (#23361)
Parse parquet footer length into unsigned integer (#23357)
Fix incorrect results with group_by aggregation on empty groups (#23358)
Fix boolean min() in group_by aggregation (streaming) (#23344)
Respect data-model in map_elements (#23340)
Properly join URI paths in PlPath (#23350)
Ignore null values in bitwise aggregation on bools (#23324)
Fix panic filtering after left join (#23310)
Out-of-bounds index in hot hash table (#23311)
Fix scanning '?' from cloud with glob=False (#23304)
Fix filters on inserted columns did not remove rows (#23303)
Don't ignore return_dtype (#23309)
Raise error instead of return in Series class (#23301)
Use safe parsing for get_normal_components (#23284)
Fix output column names/order of streaming coalesced right-join (#23278)
Restore concat_arr inputs expansion (#23271)
Expose FieldsMapper (#23232)
Fix time zone handling in dt.iso_year and dt.is_leap_year (#23125)

📖 Documentation

Fix str.replace_many examples trigger deprecation warning (#23695)
Point the R Polars version on R-multiverse (#23660)
Update example for writing to cloud storage (#20265)
Update GPU docs for RAPIDS CUDA 11 deprecation (#23620)
Add docs of Expr.list.filter and Series.list.filter (#23589)
Add page about billing to Polars Cloud user guide (#23564)
Small user-guide improvement and fixes (#23549)
Correct note in from_pandas about data being cloned (#23552)
Fix a few typos in the "Streaming" section (#23536)
Update streaming page (#23535)
Update structure of Polars Cloud documentation (#23496)
Update example code in pandas migration guide (#23403)
Correct plugins user guide to reflect that teaching Expr.language is in a different section (#23377)
Add example of using OR in join_where (#23375)
Update when_then in user guide (#23245)

📦 Build system

Update all rand code (#23387)

🛠️ Other improvements

Add hashes json (#23758)
Add `AExpr::is_expr...

Contributors

mrkn, orlp, and 29 other contributors

Assets 4

30 Jun 14:42

stijnherfst

rs-0.49.1

99e94c9

Rust Polars 0.49.1

🚀 Performance improvements

Optimize Bitmap::make_mut (#23138)

✨ Enhancements

Allow expression input for length parameter in pad_start, pad_end, and zfill (#23182)

🐞 Bug fixes

Expose FieldsMapper (#23232)
Fix time zone handling in dt.iso_year and dt.is_leap_year (#23125)

📖 Documentation

Update when_then in user guide (#23245)

🛠️ Other improvements

Connect Python assert_dataframe_equal() to Rust back-end (#23207)
Fix time zone handling in dt.iso_year and dt.is_leap_year (#23125)
Update Rust Polars versions (#23229)

Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @mcrumiller, @mrkn, @stijnherfst and @zyctree

Contributors

mrkn, mcrumiller, and 3 other contributors

Assets 2

30 Jun 14:22

github-actions

rs-0.49.0

3e35098

Rust Polars 0.49.0

💥 Breaking changes

Remove old streaming engine (#23103)

🚀 Performance improvements

Improve streaming groupby CSE (#23092)
Move row index materialization in post-apply to occur after slicing (#22995)
Add first_(true|false)_idx to BooleanChunked and use in bool arg_(min|max) (#22907)
Don't go through row encoding for most types on index_of (#22903)
Optimise low-level null scans and arg_max for bools (when chunked) (#22897)
Optimize multiscan performance (#22886)

✨ Enhancements

Native implementation for Iceberg positional deletes (#23091)
Remove old streaming engine (#23103)
Make match_chunks public (#23101)
Implement StructFunction expressions in into_py (#23022)
Basic implementation of DataTypeExpr in Rust DSL (#23049)
Add required: bool to ParquetFieldOverwrites (#23013)
Support serializing name.map_fields (#22997)
Support serializing Expr::RenameAlias (#22988)
Remove duplicate verbose logging from FetchedCredentialsCache (#22973)
Add keys column in finish_callback (#22968)
Add extra_columns parameter to scan_parquet (#22699)
Add CORR function to polars SQL (#22690)
Add per partition sort and finish callback to sinks (#22789)
Add and test DataFrame equality functionality (#22865)
Support descendingly-sorted values in search_sorted() (#22825)
Derive DSL schema (#22866)

🐞 Bug fixes

Restrict custom aggregate_function in pivot to pl.element() (#23155)
Don't leak SourceToken in in-memory sink linearize (#23201)
Fix panic reading empty parquet with multiple boolean columns (#23159)
Raise ComputeError instead of panicking in truncate when mixing month/week/day/sub-daily units (#23176)
Materialize list.eval with unknown type (#23186)
Only set sorting flag for 1st column with PQ SortingColumns (#23184)
Typo in AExprBuilder (#23171)
Null return from var/std on scalar column (#23158)
Support Datetime broadcast in list.concat (#23137)
Ensure projection pushdown maintains right table schema (#22603)
Don't create i128 scalars if dtype-128 is not set (#23118)
Add Null dtype support to arg_sort_by (#23107)
Raise error by default on invalid CSV quotes (#22876)
Fix group_by mean and median returning all nulls for Decimal dtype (#23093)
Fix hive partition pruning not filtering out __HIVE_DEFAULT_PARTITION__ (#23074)
Fix AssertionError when using scan_delta() on AWS with storage_options (#23076)
Fix deadlock on collect(background=True) / collect_concurrently() (#23075)
Incorrect null count in rolling_min/max (#23073)
Preserve file:// in LazyFrame node traverser (#23072)
Respect column order in register_io_source schema (#23057)
Incorrect output when using sort with group_by and cum_sum (#23001)
Implement owned arithmetic for Int128 (#23055)
Do not schema-match structs with different field counts (#23018)
Fix confusing error message on duplicate row_index (#23043)
Add include_nulls to Agg::Count CSE check (#23032)
View buffer exceeding 2^32 - 1 bytes in concatenate_view (#23017)
Fix incorrect size_hint() for FlatIter (#23010)
Fix incorrect result selecting pl.len() from scan_csv with skip_lines (#22949)
Allow for IO plugins with reordered columns in streaming (#22987)
Method str.zfill was inconsistent with Python and pandas when string contained leading '+' (#22985)
Integer underflow in propagate_nulls (#22986)
Fix cum_min and cum_max does not preserve inf or -inf values at series start (#22896)
Setting compat_level=0 for sink_ipc (#22960)
Support arrow Decimal32 and Decimal64 types (#22954)
Update arrow format (#22941)
Fix filter pushdown to IO plugins (#22910)
Improve numeric stability rolling_mean<f32> (#22944)
Allow subclasses in type equality checking (#22915)
Return early in pl.Expr.__array_ufunc__ when only single input (#22913)
Add inline implodes in type coercion (#22885)
Correct int_ranges to raise error on invalid inputs (#22894)
Set the sorted flag on Array after it is sorted (#22822)
Don't silently overflow for temporal casts (#22901)
Fix error using write_csv with storage_options (#22881)
Schema resolution .over(mapping_strategy="join") with non-aggregations (#22875)
Ensure rename behaves the same as select (#22852)

📖 Documentation

Update when_then in user guide (#23245)
Minor improvement to cum_count docstring example (#23099)
Add missing entry for LazyFrame __getitem__ (#22924)

📦 Build system

Actually disable ir_serde by default (#23046)
Add a feature flag for serde_ignored (#22957)
Fix warnings, update DSL version and schema hash (#22953)

🛠️ Other improvements

Update Rust Polars versions (#23229)
Change flake to use venv (#23219)
Add default_alloc feature to py-polars (#23202)
Added more descriptive error message by replacing FixedSizeList with Array (#23168)
Connect Python assert_series_equal() to Rust back-end (#23141)
Refactor skip_batches to use AExprBuilder (#23147)
Use ir_serde instead of serde for IRFunctionExpr (#23148)
Separate FunctionExpr and IRFunctionExpr (#23140)
Improve Series equality functionality and prepare for Python integration (#23136)
Add PolarsPhysicalType and use it to dispatch into_series (#23080)
Remove AExpr::Alias (#23070)
Add components for Iceberg deletion file support (#23059)
Feature gate StructFunction::JsonEncode (#23060)
Propagate iceberg position delete information to IR (#23045)
Add environment variable to get Parquet decoding metrics (#23052)
Turn pl.cumulative_eval into its own AExpr (#22994)
Add make test-streaming (#23044)
Move scan parameter parsing for parquet to reusable function (#23019)
Use a ref-counted UniqueId instead of usize for cache_id (#22984)
Implement Hash and use SpecialEq for RenameAliasFn (#22989)
Turn list.eval into an AExpr (#22911)
Only check for unknown DSL fields if minor is higher (#22970)
Don't enable ir_serde together with serde (#22969)
Make dtype field on Logical non-optional (#22966)
Add new (Frozen)Categories and CategoricalMapping (#22956)
Add a CI check for DSL schema changes (#22898)
Add schema parameters to expr.meta (#22906)
Update rust toolchain in nix flake (#22905)
Update toolchain (#22859)

Thank you to all our contributors for making this release possible!
@Athsus, @DahaoALG, @FabianWolff, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @SanjitBasker, @alexander-beedie, @borchero, @bschoenmaeckers, @cmdlineluser, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @florian-klein, @i1oveMyse1f, @ion-elgreco, @itamarst, @kdn36, @kutal10, @math-hiyoko, @mcrumiller, @mrkn, @mroeschke, @nameexhaustion, @nikaltipar, @orlp, @paskhaver, @ritchie46, @stijnherfst, @thomasfrederikhoeck and @zyctree

Contributors

mrkn, orlp, and 31 other contributors

Assets 2

18 Jun 12:01

github-actions

py-1.31.0

6e02c20

Python Polars 1.31.0

💥 Breaking changes

Remove old streaming engine (#23103)

⚠️ Deprecations

Deprecate allow_missing_columns in scan_parquet in favor of missing_columns (#22784)

🚀 Performance improvements

Improve streaming groupby CSE (#23092)
Move row index materialization in post-apply to occur after slicing (#22995)
Add first_(true|false)_idx to BooleanChunked and use in bool arg_(min|max) (#22907)
Don't go through row encoding for most types on index_of (#22903)
Optimise low-level null scans and arg_max for bools (when chunked) (#22897)
Optimize multiscan performance (#22886)

✨ Enhancements

DataType expressions in Python (#23167)
Native implementation for Iceberg positional deletes (#23091)
Remove old streaming engine (#23103)
Basic implementation of DataTypeExpr in Rust DSL (#23049)
Add required: bool to ParquetFieldOverwrites (#23013)
Support serializing name.map_fields (#22997)
Support serializing Expr::RenameAlias (#22988)
Remove duplicate verbose logging from FetchedCredentialsCache (#22973)
Add keys column in finish_callback (#22968)
Add extra_columns parameter to scan_parquet (#22699)
Add CORR function to polars SQL (#22690)
Add per partition sort and finish callback to sinks (#22789)
Support descendingly-sorted values in search_sorted() (#22825)
Derive DSL schema (#22866)

🐞 Bug fixes

Remove axis in show_graph (#23218)
Remove axis ticks in show_graph (#23210)
Restrict custom aggregate_function in pivot to pl.element() (#23155)
Don't leak SourceToken in in-memory sink linearize (#23201)
Fix panic reading empty parquet with multiple boolean columns (#23159)
Raise ComputeError instead of panicking in truncate when mixing month/week/day/sub-daily units (#23176)
Materialize list.eval with unknown type (#23186)
Only set sorting flag for 1st column with PQ SortingColumns (#23184)
Typo in AExprBuilder (#23171)
Null return from var/std on scalar column (#23158)
Support Datetime broadcast in list.concat (#23137)
Ensure projection pushdown maintains right table schema (#22603)
Add Null dtype support to arg_sort_by (#23107)
Raise error by default on invalid CSV quotes (#22876)
Fix group_by mean and median returning all nulls for Decimal dtype (#23093)
Fix hive partition pruning not filtering out __HIVE_DEFAULT_PARTITION__ (#23074)
Fix AssertionError when using scan_delta() on AWS with storage_options (#23076)
Fix deadlock on collect(background=True) / collect_concurrently() (#23075)
Incorrect null count in rolling_min/max (#23073)
Preserve file:// in LazyFrame node traverser (#23072)
Respect column order in register_io_source schema (#23057)
Don't call unnest for objects implementing __arrow_c_array__ (#23069)
Incorrect output when using sort with group_by and cum_sum (#23001)
Implement owned arithmetic for Int128 (#23055)
Do not schema-match structs with different field counts (#23018)
Fix confusing error message on duplicate row_index (#23043)
Add include_nulls to Agg::Count CSE check (#23032)
View buffer exceeding 2^32 - 1 bytes in concatenate_view (#23017)
Fix incorrect result selecting pl.len() from scan_csv with skip_lines (#22949)
Allow for IO plugins with reordered columns in streaming (#22987)
Method str.zfill was inconsistent with Python and pandas when string contained leading '+' (#22985)
Integer underflow in propagate_nulls (#22986)
Setting compat_level=0 for sink_ipc (#22960)
Narrow return type for DataType.is_, improve Pyright's type completeness from 69% to 95% (#22962)
Support arrow Decimal32 and Decimal64 types (#22954)
Guard against dictionaries being passed to projection keywords (#22928)
Update arrow format (#22941)
Fix filter pushdown to IO plugins (#22910)
Improve numeric stability rolling_mean<f32> (#22944)
Guard against invalid nested objects in 'map_elements' (#22932)
Allow subclasses in type equality checking (#22915)
Return early in pl.Expr.__array_ufunc__ when only single input (#22913)
Add inline implodes in type coercion (#22885)
Add {top, bottom}_k_by to Series (#22902)
Correct int_ranges to raise error on invalid inputs (#22894)
Don't silently overflow for temporal casts (#22901)
Fix error using write_csv with storage_options (#22881)
Schema resolution .over(mapping_strategy="join") with non-aggregations (#22875)
Ensure rename behaves the same as select (#22852)

📖 Documentation

Document aggregations that return identity when there's no non-null values, suggest workaround for those who want SQL-standard behaviour (#23143)
Fix reference to non-existent Expr.replace_all in replace_strict docs (#23144)
Fix typo on pandas comparison page (#23123)
Minor improvement to cum_count docstring example (#23099)
Add missing DataFrame.__setitem__ to API reference (#22938)
Add missing entry for LazyFrame __getitem__ (#22924)
Add missing top_k_by and bottom_k_by to Series reference (#22917)

📦 Build system

Update pyo3 and numpy crates to version 0.25 (#22763)
Actually disable ir_serde by default (#23046)
Add a feature flag for serde_ignored (#22957)
Fix warnings, update DSL version and schema hash (#22953)

🛠️ Other improvements

Change flake to use venv (#23219)
Add default_alloc feature to py-polars (#23202)
Added more descriptive error message by replacing FixedSizeList with Array (#23168)
Connect Python assert_series_equal() to Rust back-end (#23141)
Refactor skip_batches to use AExprBuilder (#23147)
Use ir_serde instead of serde for IRFunctionExpr (#23148)
Separate FunctionExpr and IRFunctionExpr (#23140)
Remove AExpr::Alias (#23070)
Add components for Iceberg deletion file support (#23059)
Feature gate StructFunction::JsonEncode (#23060)
Propagate iceberg position delete information to IR (#23045)
Add environment variable to get Parquet decoding metrics (#23052)
Turn pl.cumulative_eval into its own AExpr (#22994)
Add make test-streaming (#23044)
Move scan parameter parsing for parquet to reusable function (#23019)
Prepare deltalake 1.0 (#22931)
Implement Hash and use SpecialEq for RenameAliasFn (#22989)
Turn list.eval into an AExpr (#22911)
Fix CI for latest pandas-stubs release (#22971)
Add a CI check for DSL schema changes (#22898)
Add schema parameters to expr.meta (#22906)
Update rust toolchain in nix flake (#22905)
Update toolchain (#22859)

Thank you to all our contributors for making this release possible!
@Athsus, @DahaoALG, @FabianWolff, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @SanjitBasker, @alexander-beedie, @bschoenmaeckers, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @florian-klein, @i1oveMyse1f, @ion-elgreco, @itamarst, @kdn36, @kutal10, @mcrumiller, @mroeschke, @nameexhaustion, @nikaltipar, @orlp, @paskhaver, @ritchie46, @stijnherfst and @thomasfrederikhoeck

Contributors

orlp, dsprenkels, and 26 other contributors

Assets 4

14 Jun 09:17

github-actions

py-1.31.0-beta.1

383f1b3

Python Polars 1.31.0-beta.1 Pre-release

Pre-release

💥 Breaking changes

Remove old streaming engine (#23103)

⚠️ Deprecations

Deprecate allow_missing_columns in scan_parquet in favor of missing_columns (#22784)

🚀 Performance improvements

Improve streaming groupby CSE (#23092)
Move row index materialization in post-apply to occur after slicing (#22995)
Add first_(true|false)_idx to BooleanChunked and use in bool arg_(min|max) (#22907)
Don't go through row encoding for most types on index_of (#22903)
Optimise low-level null scans and arg_max for bools (when chunked) (#22897)
Optimize multiscan performance (#22886)

✨ Enhancements

DataType expressions in Python (#23167)
Native implementation for Iceberg positional deletes (#23091)
Remove old streaming engine (#23103)
Basic implementation of DataTypeExpr in Rust DSL (#23049)
Add required: bool to ParquetFieldOverwrites (#23013)
Support serializing name.map_fields (#22997)
Support serializing Expr::RenameAlias (#22988)
Remove duplicate verbose logging from FetchedCredentialsCache (#22973)
Add keys column in finish_callback (#22968)
Add extra_columns parameter to scan_parquet (#22699)
Add CORR function to polars SQL (#22690)
Add per partition sort and finish callback to sinks (#22789)
Support descendingly-sorted values in search_sorted() (#22825)
Derive DSL schema (#22866)

🐞 Bug fixes

Fix panic reading empty parquet with multiple boolean columns (#23159)
Raise ComputeError instead of panicking in truncate when mixing month/week/day/sub-daily units (#23176)
Materialize list.eval with unknown type (#23186)
Only set sorting flag for 1st column with PQ SortingColumns (#23184)
Typo in AExprBuilder (#23171)
Null return from var/std on scalar column (#23158)
Support Datetime broadcast in list.concat (#23137)
Ensure projection pushdown maintains right table schema (#22603)
Add Null dtype support to arg_sort_by (#23107)
Raise error by default on invalid CSV quotes (#22876)
Fix group_by mean and median returning all nulls for Decimal dtype (#23093)
Fix hive partition pruning not filtering out __HIVE_DEFAULT_PARTITION__ (#23074)
Fix AssertionError when using scan_delta() on AWS with storage_options (#23076)
Fix deadlock on collect(background=True) / collect_concurrently() (#23075)
Incorrect null count in rolling_min/max (#23073)
Preserve file:// in LazyFrame node traverser (#23072)
Respect column order in register_io_source schema (#23057)
Don't call unnest for objects implementing __arrow_c_array__ (#23069)
Incorrect output when using sort with group_by and cum_sum (#23001)
Implement owned arithmetic for Int128 (#23055)
Do not schema-match structs with different field counts (#23018)
Fix confusing error message on duplicate row_index (#23043)
Add include_nulls to Agg::Count CSE check (#23032)
View buffer exceeding 2^32 - 1 bytes in concatenate_view (#23017)
Fix incorrect result selecting pl.len() from scan_csv with skip_lines (#22949)
Allow for IO plugins with reordered columns in streaming (#22987)
Method str.zfill was inconsistent with Python and pandas when string contained leading '+' (#22985)
Integer underflow in propagate_nulls (#22986)
Setting compat_level=0 for sink_ipc (#22960)
Narrow return type for DataType.is_, improve Pyright's type completeness from 69% to 95% (#22962)
Support arrow Decimal32 and Decimal64 types (#22954)
Guard against dictionaries being passed to projection keywords (#22928)
Update arrow format (#22941)
Fix filter pushdown to IO plugins (#22910)
Improve numeric stability rolling_mean<f32> (#22944)
Guard against invalid nested objects in 'map_elements' (#22932)
Allow subclasses in type equality checking (#22915)
Return early in pl.Expr.__array_ufunc__ when only single input (#22913)
Add inline implodes in type coercion (#22885)
Add {top, bottom}_k_by to Series (#22902)
Correct int_ranges to raise error on invalid inputs (#22894)
Don't silently overflow for temporal casts (#22901)
Fix error using write_csv with storage_options (#22881)
Schema resolution .over(mapping_strategy="join") with non-aggregations (#22875)
Ensure rename behaves the same as select (#22852)

📖 Documentation

Document aggregations that return identity when there's no non-null values, suggest workaround for those who want SQL-standard behaviour (#23143)
Fix reference to non-existent Expr.replace_all in replace_strict docs (#23144)
Fix typo on pandas comparison page (#23123)
Minor improvement to cum_count docstring example (#23099)
Add missing DataFrame.__setitem__ to API reference (#22938)
Add missing entry for LazyFrame __getitem__ (#22924)
Add missing top_k_by and bottom_k_by to Series reference (#22917)

📦 Build system

Update pyo3 and numpy crates to version 0.25 (#22763)
Actually disable ir_serde by default (#23046)
Add a feature flag for serde_ignored (#22957)
Fix warnings, update DSL version and schema hash (#22953)

🛠️ Other improvements

Added more descriptive error message by replacing FixedSizeList with Array (#23168)
Connect Python assert_series_equal() to Rust back-end (#23141)
Refactor skip_batches to use AExprBuilder (#23147)
Use ir_serde instead of serde for IRFunctionExpr (#23148)
Separate FunctionExpr and IRFunctionExpr (#23140)
Remove AExpr::Alias (#23070)
Add components for Iceberg deletion file support (#23059)
Feature gate StructFunction::JsonEncode (#23060)
Propagate iceberg position delete information to IR (#23045)
Add environment variable to get Parquet decoding metrics (#23052)
Turn pl.cumulative_eval into its own AExpr (#22994)
Add make test-streaming (#23044)
Move scan parameter parsing for parquet to reusable function (#23019)
Prepare deltalake 1.0 (#22931)
Implement Hash and use SpecialEq for RenameAliasFn (#22989)
Turn list.eval into an AExpr (#22911)
Fix CI for latest pandas-stubs release (#22971)
Add a CI check for DSL schema changes (#22898)
Add schema parameters to expr.meta (#22906)
Update rust toolchain in nix flake (#22905)
Update toolchain (#22859)

Thank you to all our contributors for making this release possible!
@Athsus, @DahaoALG, @FabianWolff, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @SanjitBasker, @alexander-beedie, @bschoenmaeckers, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @florian-klein, @i1oveMyse1f, @ion-elgreco, @itamarst, @kdn36, @kutal10, @mroeschke, @nameexhaustion, @nikaltipar, @orlp, @paskhaver, @ritchie46, @stijnherfst and @thomasfrederikhoeck

Contributors

orlp, dsprenkels, and 25 other contributors

Assets 4

21 May 11:05

github-actions

rs-0.48.1

5e1b4b7

Rust Polars 0.48.1

🚀 Performance improvements

Switch eligible casts to non-strict in optimizer (#22850)

🐞 Bug fixes

Fix RuntimeError when serializing the same DataFrame from multiple threads (#22844)

📦 Build system

Fix building polars-lazy with certain features (#22846)
Add missing features (#22839)

🛠️ Other improvements

Update Rust Polars versions (#22854)

Thank you to all our contributors for making this release possible!
@JakubValtar, @bschoenmaeckers, @nameexhaustion and @stijnherfst

Contributors

JakubValtar, bschoenmaeckers, and 2 other contributors

Assets 2

21 May 13:33

github-actions

py-1.30.0

ee0903b

Python Polars 1.30.0

🚀 Performance improvements

Switch eligible casts to non-strict in optimizer (#22850)
Allow predicate passing set_sorted (#22797)
Increase default cross-file parallelism limit for new-streaming multiscan (#22700)
Add elementwise execution mode for list.eval (#22715)
Support optimised init from non-dict Mapping objects in from_records and frame/series constructors (#22638)
Add streaming cross-join node (#22581)
Switch off maintain_order in group-by followed by sort (#22492)

✨ Enhancements

Load AWS endpoint_url using boto3 (#22851)
Implemented list.filter (#22749)
Support binaryoffset in search sorted (#22786)
Add nulls_equal flag to list/arr.contains (#22773)
Implement LazyFrame.match_to_schema (#22726)
Improved time-string parsing and inference (generally, and via the SQL interface) (#22606)
Allow for .over to be called without partition_by (#22712)
Support AnyValue translation from PyMapping values (#22722)
Support optimised init from non-dict Mapping objects in from_records and frame/series constructors (#22638)
Support inference of Int128 dtype from databases that support it (#22682)
Add options to write Parquet field metadata (#22652)
Add cast_options parameter to control type casting in scan_parquet (#22617)
Allow casting List<UInt8> to Binary (#22611)
Allow setting of regex size limit using POLARS_REGEX_SIZE_LIMIT (#22651)
Support use of literal values as "other" when evaluating Series.zip_with (#22632)
Allow to read and write custom file-level parquet metadata (#21806)
Support PEP702 @deprecated decorator behaviour (#22594)
Support grouping by pl.Array (#22575)
Preserve exception type and traceback for errors raised from Python (#22561)
Use fixed-width font in streaming phys plan graph (#22540)

🐞 Bug fixes

Fix RuntimeError when serializing the same DataFrame from multiple threads (#22844)
Fix map_elements predicate pushdown (#22833)
Fix reverse list type (#22832)
Don't require numpy for search_sorted (#22817)
Add type equality checking for relevant methods (#22802)
Invalid output for fill_null after when.then on structs (#22798)
Don't panic for cross join with misaligned chunking (#22799)
Panic on quantile over nulls in rolling window (#22792)
Respect BinaryOffset metadata (#22785)
Correct the output order of PartitionByKey and PartitionParted (#22778)
Fallback to non-strict casting for deprecated casts (#22760)
Clippy on new stable version (#22771)
Handle sliced out remainder for bitmaps (#22759)
Don't merge Enum categories on append (#22765)
Fix unnest() not working on empty struct columns (#22391)
Fix the default value type in Schema init (#22589)
Correct name in unnest error message (#22740)
Provide "schema" to DataFrame, even if empty JSON (#22739)
Properly account for nulls in the is_not_nan check made in drop_nans (#22707)
Incorrect result from SQL count(*) with partition by (#22728)
Fix deadlock joining scanned tables with low thread count (#22672)
Don't allow deserializing incompatible DSL (#22644)
Incorrect null dtype from binary ops in empty group_by (#22721)
Don't mark str.replace_many with Mapping as deprecated (#22697)
Gzip has maximum compression of 9, not 10 (#22685)
Fix predicate pushdown of fallible expressions (#22669)
Fix index out of bounds panic when scanning hugging face (#22661)
Panic on group_by with literal and empty rows (#22621)
Return input instead of panicking if empty subset in drop_nulls() and drop_nans() (#22469)
Bump argminmax to 0.6.3 (#22649)
DSL version deserialization endianness (#22642)
Allow Expr.round() to be called on integer dtypes (#22622)
Fix panic when filtering based on row index column in parquet (#22616)
WASM and PyOdide compile (#22613)
Resolve get() SchemaMismatch panic (#22350)
Panic in group_by_dynamic on single-row df with group_by (#22597)
Add new_streaming feature to polars crate (#22601)
Consistently use Unix epoch as origin for dt.truncate (except weekly buckets which start on Mondays) (#22592)
Fix interpolate on dtype Decimal (#22541)
CSV count rows skipped last line if file did not end with newline (#22577)
Make nested strict casting actually strict (#22497)
Make replace and replace_strict mapping use list literals (#22566)
Allow pivot on Time column (#22550)
Fix error when providing CSV schema with extra columns (#22544)
Panic on bitwise op between Series and Expr (#22527)
Multi-selector regex expansion (#22542)

📖 Documentation

Add pre-release policy (#22808)
Fix broken link to service account page in Polars Cloud docs (#22762)
Add match_to_schema to API reference (#22777)
Provide additional explanation and examples for the value_counts "normalize" parameter (#22756)
Rework documentation for drop/fill for nulls/nans (#22657)
Add documentation to new RoundMode parameter in round (#22555)
Add missing repeat_by to API reference, fixup list.get (#22698)
Fix non-rendering bullet points in scan_iceberg (#22694)
Improve insert_column docstring (description and examples) (#22551)
Improve join documentation (#22556)

📦 Build system

Fix building polars-lazy with certain features (#22846)
Add missing features (#22839)
Patch pyo3 to disable recompilation (#22796)

🛠️ Other improvements

Update Rust Polars versions (#22854)
Add basic smoke test for free-threaded python (#22481)
Update Polars Rust versions (#22834)
Fix nix build (#22809)
Fix flake.nix to work on macos (#22803)
Unused variables on release build (#22800)
Update cloud docs (#22624)
Fix unstable list.eval performance test (#22729)
Add proptest implementations for all Array types (#22711)
Dispatch .write_* to .lazy().sink_*(engine='in-memory') (#22582)
Move to all optimization flags to QueryOptFlags (#22680)
Add test for str.replace_many (#22615)
Stabilize sink_* (#22643)
Add proptest for row-encode (#22626)
Update rust version in nix flake (#22627)
Add a nix flake with a devShell and package (#22246)
Use a wrapper struct to store time zone (#22523)
Add proptest testing for for parquet decoding kernels (#22608)
Include equiprobable as valid quantile method (#22571)
Remove confusing error context calling .collect(_eager=True) (#22602)
Fix test_truncate_path test case (#22598)
Unify function flags into 1 bitset (#22573)
Display the operation behind in-memory-map (#22552)

Thank you to all our contributors for making this release possible!
@IvanIsCoding, @JakubValtar, @Julian-J-S, @LucioFranco, @MarcoGorelli, @WH-2099, @alexander-beedie, @borchero, @bschoenmaeckers, @cmdlineluser, @coastalwhite, @etiennebacher, @florian-klein, @itamarst, @kdn36, @mcrumiller, @nameexhaustion, @nikaltipar, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @teotwaki, @timkpaine and @wence-

Contributors

orlp, wence-, and 24 other contributors

Assets 4

20 May 11:07

github-actions

rs-0.48.0

bfa5e96

Rust Polars 0.48.0

💥 Breaking changes

Use a wrapper struct to store time zone (#22523)

🚀 Performance improvements

Allow predicate passing set_sorted (#22797)
Increase default cross-file parallelism limit for new-streaming multiscan (#22700)
Add elementwise execution mode for list.eval (#22715)
Support optimised init from non-dict Mapping objects in from_records and frame/series constructors (#22638)
Add streaming cross-join node (#22581)
Switch off maintain_order in group-by followed by sort (#22492)

✨ Enhancements

Format named functions (#22831)
Implemented list.filter (#22749)
Support binaryoffset in search sorted (#22786)
Add nulls_equal flag to list/arr.contains (#22773)
Allow named opaque functions for serde (#22734)
Implement LazyFrame.match_to_schema (#22726)
Improved time-string parsing and inference (generally, and via the SQL interface) (#22606)
Allow for .over to be called without partition_by (#22712)
Support AnyValue translation from PyMapping values (#22722)
Support optimised init from non-dict Mapping objects in from_records and frame/series constructors (#22638)
Add options to write Parquet field metadata (#22652)
Allow casting List<UInt8> to Binary (#22611)
Allow setting of regex size limit using POLARS_REGEX_SIZE_LIMIT (#22651)

🐞 Bug fixes

Fix reverse list type (#22832)
Add type equality checking for relevant methods (#22802)
Invalid output for fill_null after when.then on structs (#22798)
Don't panic for cross join with misaligned chunking (#22799)
Panic on quantile over nulls in rolling window (#22792)
Respect BinaryOffset metadata (#22785)
Correct the output order of PartitionByKey and PartitionParted (#22778)
Fallback to non-strict casting for deprecated casts (#22760)
Clippy on new stable version (#22771)
Handle sliced out remainder for bitmaps (#22759)
Don't merge Enum categories on append (#22765)
Fix unnest() not working on empty struct columns (#22391)
Correct name in unnest error message (#22740)
Properly account for nulls in the is_not_nan check made in drop_nans (#22707)
Incorrect result from SQL count(*) with partition by (#22728)
Fix deadlock joining scanned tables with low thread count (#22672)
Don't allow deserializing incompatible DSL (#22644)
Incorrect null dtype from binary ops in empty group_by (#22721)
Don't mark str.replace_many with Mapping as deprecated (#22697)
Gzip has maximum compression of 9, not 10 (#22685)
Fix predicate pushdown of fallible expressions (#22669)
Fix index out of bounds panic when scanning hugging face (#22661)
Fix polars crate not compiling when lazy feature enabled (#22655)
Panic on group_by with literal and empty rows (#22621)
Return input instead of panicking if empty subset in drop_nulls() and drop_nans() (#22469)
Bump argminmax to 0.6.3 (#22649)
DSL version deserialization endianness (#22642)
Fix nested dtype row encoding (#22557)
Allow Expr.round() to be called on integer dtypes (#22622)
Fix panic when filtering based on row index column in parquet (#22616)
WASM and PyOdide compile (#22613)
Resolve get() SchemaMismatch panic (#22350)

📖 Documentation

Add pre-release policy (#22808)
Fix broken link to service account page in Polars Cloud docs (#22762)
Rework documentation for drop/fill for nulls/nans (#22657)

📦 Build system

Patch pyo3 to disable recompilation (#22796)

🛠️ Other improvements

Update Polars Rust versions (#22834)
Cleanup polars-python lifetimes (#22548)
Fix nix build (#22809)
Fix flake.nix to work on macos (#22803)
Remove unused dependencies in polars-arrow (#22806)
Unused variables on release build (#22800)
Update cloud docs (#22624)
Add proptest implementations for all Array types (#22711)
Dispatch .write_* to .lazy().sink_*(engine='in-memory') (#22582)
Move to all optimization flags to QueryOptFlags (#22680)
Add test for str.replace_many (#22615)
Stabilize sink_* (#22643)
Add proptest for row-encode (#22626)
Emphasize PolarsDataType::get_dtype is static-only (#22648)
Use named fields for Logical (#22647)
Update rust version in nix flake (#22627)
Add a nix flake with a devShell and package (#22246)
Use a wrapper struct to store time zone (#22523)
Add proptest testing for for parquet decoding kernels (#22608)

Thank you to all our contributors for making this release possible!
@IvanIsCoding, @JakubValtar, @Julian-J-S, @LucioFranco, @MarcoGorelli, @WH-2099, @alexander-beedie, @borchero, @bschoenmaeckers, @cmdlineluser, @coastalwhite, @etiennebacher, @florian-klein, @itamarst, @kdn36, @nameexhaustion, @nikaltipar, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @teotwaki, @timkpaine and @wence-

Contributors

orlp, wence-, and 23 other contributors

Assets 2

Uh oh!

Releases: pola-rs/polars

Rust Polars 0.50.0

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.32.0

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.32.0-beta.1

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Rust Polars 0.49.1

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Uh oh!

Rust Polars 0.49.0

💥 Breaking changes

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.31.0

💥 Breaking changes

⚠️ Deprecations

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.31.0-beta.1

💥 Breaking changes

⚠️ Deprecations

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Rust Polars 0.48.1

🚀 Performance improvements

🐞 Bug fixes

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.30.0