You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-50815][PYTHON] Fix Variant Local Data to Arrow Conversion
### What changes were proposed in this pull request?
This PR removes unnecessary code for converting Variants in PySpark from local to arrow representation. This allows createDataFrame and Python Datasources to work seamlessly with Variants.
### Why are the changes needed?
[This PR](apache#45826) introduced code to convert Variants from internal representation to representation in Arrow (LocalDataToArrowConversion). However, the internal representation is assumed to be `dict` and the arrow representation is assumed to be `VariantVal` even though it should be the other way around. It appears this code written in the PR is not actually encountered in any tests.
This caused `createDataFrame` to not work with Variants and the [attempted fix](apache#49487) added a special case (`variants_as_dicts`) for this code, even though the special case was actually the only use case. This PR removes the old unnecessary code and only keeps the "special case" code as the main code for converting Variant from local (`VariantVal`) to Arrow (`dict`).
### Does this PR introduce _any_ user-facing change?
This will allow users to use Python datasources with Variants.
### How was this patch tested?
Existing tests should pass, and a new unit test for Python Datasources was added.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closesapache#51082 from harshmotw-db/harsh-motwani_data/experimental_variant_fix.
Authored-by: Harsh Motwani <harsh.motwani@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
0 commit comments