Skip to content

Commit 3e30ba9

Browse files
austinrwarneryhuang-db
authored andcommitted
[SPARK-52352][PYTHON][DOCS] Update pyspark.sql.functions.to_json docstring to include VariantType as valid input
### What changes were proposed in this pull request? Updated the `pyspark.sql.functions.to_json` docstring to include `VariantType` as a valid input. This includes updates to the summary line, the `col` parameter description, and a new example. ### Why are the changes needed? With the release of Spark 4.0, users of the new Variant Type will sometimes need to save out the JSON string representation when using PySpark. Before this change, the API docs flasely imply that `to_json` cannot be used for VariantType columns. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No tests added (docs-only change) ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#51064 from austinrwarner/SPARK-52352. Authored-by: Austin Warner <austin.richard.warner@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent b923243 commit 3e30ba9

File tree

1 file changed

+17
-6
lines changed

1 file changed

+17
-6
lines changed

python/pyspark/sql/functions/builtin.py

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20712,8 +20712,8 @@ def schema_of_variant_agg(v: "ColumnOrName") -> Column:
2071220712
@_try_remote_functions
2071320713
def to_json(col: "ColumnOrName", options: Optional[Mapping[str, str]] = None) -> Column:
2071420714
"""
20715-
Converts a column containing a :class:`StructType`, :class:`ArrayType` or a :class:`MapType`
20716-
into a JSON string. Throws an exception, in the case of an unsupported type.
20715+
Converts a column containing a :class:`StructType`, :class:`ArrayType`, :class:`MapType`
20716+
or a :class:`VariantType` into a JSON string. Throws an exception, in the case of an unsupported type.
2071720717

2071820718
.. versionadded:: 2.1.0
2071920719

@@ -20723,7 +20723,7 @@ def to_json(col: "ColumnOrName", options: Optional[Mapping[str, str]] = None) ->
2072320723
Parameters
2072420724
----------
2072520725
col : :class:`~pyspark.sql.Column` or str
20726-
name of column containing a struct, an array or a map.
20726+
name of column containing a struct, an array, a map, or a variant object.
2072720727
options : dict, optional
2072820728
options to control converting. accepts the same options as the JSON datasource.
2072920729
See `Data Source Option <https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option>`_
@@ -20777,7 +20777,18 @@ def to_json(col: "ColumnOrName", options: Optional[Mapping[str, str]] = None) ->
2077720777
|{"name":"Alice"}|
2077820778
+----------------+
2077920779

20780-
Example 4: Converting a nested MapType column to JSON
20780+
Example 4: Converting a VariantType column to JSON
20781+
20782+
>>> import pyspark.sql.functions as sf
20783+
>>> df = spark.createDataFrame([(1, '{"name": "Alice"}')], ("key", "value"))
20784+
>>> df.select(sf.to_json(sf.parse_json(df.value)).alias("json")).show(truncate=False)
20785+
+----------------+
20786+
|json |
20787+
+----------------+
20788+
|{"name":"Alice"}|
20789+
+----------------+
20790+
20791+
Example 5: Converting a nested MapType column to JSON
2078120792

2078220793
>>> import pyspark.sql.functions as sf
2078320794
>>> df = spark.createDataFrame([(1, [{"name": "Alice"}, {"name": "Bob"}])], ("key", "value"))
@@ -20788,7 +20799,7 @@ def to_json(col: "ColumnOrName", options: Optional[Mapping[str, str]] = None) ->
2078820799
|[{"name":"Alice"},{"name":"Bob"}]|
2078920800
+---------------------------------+
2079020801

20791-
Example 5: Converting a simple ArrayType column to JSON
20802+
Example 6: Converting a simple ArrayType column to JSON
2079220803

2079320804
>>> import pyspark.sql.functions as sf
2079420805
>>> df = spark.createDataFrame([(1, ["Alice", "Bob"])], ("key", "value"))
@@ -20799,7 +20810,7 @@ def to_json(col: "ColumnOrName", options: Optional[Mapping[str, str]] = None) ->
2079920810
|["Alice","Bob"]|
2080020811
+---------------+
2080120812

20802-
Example 6: Converting to JSON with specified options
20813+
Example 7: Converting to JSON with specified options
2080320814

2080420815
>>> import pyspark.sql.functions as sf
2080520816
>>> df = spark.sql("SELECT (DATE('2022-02-22'), 1) AS date")

0 commit comments

Comments
 (0)