-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[WIP][SPARK-53938][PYTHON][CONNECT] Fix decimal rescaling in createDataFrame #52637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
raise PySparkValueError(f"input for {dataType} must not be None") | ||
return None | ||
return value | ||
return round(value, dataType.scale).normalize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question. Does this mean the result of Classic
is changed by this PR too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think no. This is only used in Spark Connect IIUC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh wait, this is also used some places in workers.
spark/python/pyspark/sql/pandas/serializers.py
Lines 924 to 928 in b3748d8
conv = LocalDataToArrowConversion._create_converter( spark_type, none_on_identity=True, int_to_decimal_coercion_enabled=self._int_to_decimal_coercion_enabled, ) spark/python/pyspark/worker.py
Lines 2258 to 2260 in b3748d8
table = LocalDataToArrowConversion.convert( data, return_type, prefers_large_var_types ) LocalDataToArrowConversion._create_converter(field.dataType) for field in return_type.fields
Are these safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see some other tests failed, let me double check
What changes were proposed in this pull request?
Fix decimal rescaling in createDataFrame
Why are the changes needed?
this query works in classic, but fails in connect
classic
connect
The root cause is the data loss in arrow conversion
Does this PR introduce any user-facing change?
yes, the query works after this PR
How was this patch tested?
added test
Was this patch authored or co-authored using generative AI tooling?
no