[SPARK-52904][PYTHON] Enable convertToArrowArraySafely by default #51596

benrobby · 2025-07-21T12:08:37Z

What changes were proposed in this pull request?

this enables spark.sql.execution.pandas.convertToArrowArraySafely by default
I am also adjusting unit tests that previously relied on implicit conversions (truncating nanosecond timestamps to microseconds / loss of precision, int overflows) and now start to fail with the new default.

Why are the changes needed?

This change aligns pyspark UDF behavior to ANSI SQL behavior in the rest of Spark. On integer overflow, the standard behavior is to throw an error. Users can and should handle such overflow or truncation cases explicitly.

Does this PR introduce any user-facing change?

Yes. This throws errors on int overflow, float truncation, and loss of precision when truncating timestamps. Citing PySpark's upgrade docs:

    =======================================  ================  =========================
    PyArrow version                          Integer overflow  Floating point truncation
    =======================================  ================  =========================
    0.11.0 and below                         Raise error       Silently allows
    > 0.11.0, arrowSafeTypeConversion=false  Silent overflow (returns 0)   Silently allows
    > 0.11.0, arrowSafeTypeConversion=true   Raise error       Raise error
    =======================================  ================  =========================

How was this patch tested?

adjusted unit tests

Was this patch authored or co-authored using generative AI tooling?

No

benrobby · 2025-07-23T12:18:54Z

python/pyspark/pandas/tests/test_numpy_compat.py

+                                "Test in '%s' function was failed." % np_name
+                            ) from e
+            finally:
+                reset_option("compute.ops_on_diff_frames")


the diff above looks unfortunate, this is merely indented by one more level.

benrobby · 2025-07-23T12:22:11Z

@HyukjinKwon @zhengruifeng @asl3 could you take a look?

asl3

should we add a note to our migration guide docs for the user-facing change?

github-actions bot added the SQL label Jul 21, 2025

HyukjinKwon changed the title ~~[WIP][SPARK-52904][PYTHON] enable convertToArrowArraySafely by default~~ [WIP][SPARK-52904][PYTHON] Enable convertToArrowArraySafely by default Jul 21, 2025

github-actions bot added PYTHON PANDAS API ON SPARK CONNECT labels Jul 22, 2025

[SPARK-52904] enable convertToArrowArraySafely by default

3ac66bb

benrobby force-pushed the SPARK-52904 branch from 2ab9bdb to 3ac66bb Compare July 23, 2025 08:05

benrobby changed the title ~~[WIP][SPARK-52904][PYTHON] Enable convertToArrowArraySafely by default~~ [SPARK-52904][PYTHON] Enable convertToArrowArraySafely by default Jul 23, 2025

benrobby commented Jul 23, 2025

View reviewed changes

asl3 reviewed Jul 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-52904][PYTHON] Enable convertToArrowArraySafely by default #51596

[SPARK-52904][PYTHON] Enable convertToArrowArraySafely by default #51596

benrobby commented Jul 21, 2025 •

edited

Loading

Uh oh!

benrobby Jul 23, 2025

Uh oh!

benrobby commented Jul 23, 2025

Uh oh!

asl3 left a comment •

edited

Loading

Uh oh!

Uh oh!

[SPARK-52904][PYTHON] Enable convertToArrowArraySafely by default #51596

Are you sure you want to change the base?

[SPARK-52904][PYTHON] Enable convertToArrowArraySafely by default #51596

Conversation

benrobby commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

benrobby Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

benrobby commented Jul 23, 2025

Uh oh!

asl3 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

benrobby commented Jul 21, 2025 •

edited

Loading

asl3 left a comment •

edited

Loading