Skip to content

[SPARK-52243][CONNECT] Add NERF support for schema-related InvalidPlanInput errors #50997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

heyihong
Copy link
Contributor

@heyihong heyihong commented May 23, 2025

What changes were proposed in this pull request?

This PR adds NERF (New Error Framework) support for schema-related InvalidPlanInput errors in Spark Connect. The changes include:

  1. Added a new error condition in error-conditions.json for schema validation:

    • INVALID_SCHEMA_TYPE_NON_STRUCT
  2. Refactored error handling in InvalidInputErrors.scala to use the new NERF framework:

    • Added helper function invalidPlanInput for consistent error message generation
    • Updated schema validation error methods to use NERF error conditions
    • Made quoteByDefault method accessible to other packages
  3. Added a test suite InvalidInputErrorsSuite.scala to verify error handling

Why are the changes needed?

These changes are needed to:

  1. Standardize error reporting across Spark Connect using the NERF framework
  2. Improve error messages with better parameterization and consistency
  3. Ensure proper SQL state codes are associated with schema-related errors
  4. Provide clearer error messages for users when schema validation fails

Does this PR introduce any user-facing change?

No

How was this patch tested?

build/sbt "connect/testOnly *InvalidInputErrorsSuite"

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 0.50.5 (Universal)

@heyihong heyihong force-pushed the SPARK-52243 branch 12 times, most recently from f2d5abb to 01fddeb Compare May 25, 2025 15:57
@HyukjinKwon HyukjinKwon changed the title [SPARK-52243] Add NERF support for schema-related InvalidPlanInput errors [SPARK-52243][CONNECT] Add NERF support for schema-related InvalidPlanInput errors May 26, 2025
@heyihong
Copy link
Contributor Author

@@ -3432,6 +3444,12 @@
],
"sqlState" : "42602"
},
"INVALID_SCHEMA_TYPE_NON_STRUCT" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use this error condition everywhere? All of those features need to specify a schema, and INVALID_SCHEMA_TYPE_NON_STRUCT is general enough that fit all of them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To indicate where the error happened, error context is a better place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

@heyihong heyihong requested a review from cloud-fan May 28, 2025 08:53
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in f2079d8 May 28, 2025
HyukjinKwon added a commit that referenced this pull request Jun 9, 2025
…compatible with Python only client

### What changes were proposed in this pull request?

This PR is a followup of #50997 that backports the test changes into branch-3.5.

### Why are the changes needed?

To make the build pass (https://github.com/apache/spark/actions/runs/15522509263/job/43697654408).

```
======================================================================
FAIL [0.015s]: test_udtf_with_invalid_return_type (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_with_invalid_return_type)
----------------------------------------------------------------------
pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.sql.connect.common.InvalidPlanInput) [INVALID_SCHEMA_TYPE_NON_STRUCT] Invalid schema type. Expect a struct type, but got "INT". SQLSTATE: 42K09

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/connect/test_parity_udtf.py", line 57, in test_udtf_with_invalid_return_type
    with self.assertRaisesRegex(
AssertionError: "Invalid Python user-defined table function return type." does not match "(org.apache.spark.sql.connect.common.InvalidPlanInput) [INVALID_SCHEMA_TYPE_NON_STRUCT] Invalid schema type. Expect a struct type, but got "INT". SQLSTATE: 42K09"

----------------------------------------------------------------------
```

Now that the error message has changed, the tests in branch-3.5 fails.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Manually.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #51121 from HyukjinKwon/SPARK-52420.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon added a commit that referenced this pull request Jun 9, 2025
…compatible with Python only client

This PR is a followup of #50997 that backports the test changes into branch-3.5.

To make the build pass (https://github.com/apache/spark/actions/runs/15522509263/job/43697654408).

```
======================================================================
FAIL [0.015s]: test_udtf_with_invalid_return_type (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_with_invalid_return_type)
----------------------------------------------------------------------
pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.sql.connect.common.InvalidPlanInput) [INVALID_SCHEMA_TYPE_NON_STRUCT] Invalid schema type. Expect a struct type, but got "INT". SQLSTATE: 42K09

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/connect/test_parity_udtf.py", line 57, in test_udtf_with_invalid_return_type
    with self.assertRaisesRegex(
AssertionError: "Invalid Python user-defined table function return type." does not match "(org.apache.spark.sql.connect.common.InvalidPlanInput) [INVALID_SCHEMA_TYPE_NON_STRUCT] Invalid schema type. Expect a struct type, but got "INT". SQLSTATE: 42K09"

----------------------------------------------------------------------
```

Now that the error message has changed, the tests in branch-3.5 fails.

No, test-only.

Manually.

No.

Closes #51121 from HyukjinKwon/SPARK-52420.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit c75e7c7)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon added a commit that referenced this pull request Jun 9, 2025
…compatible with Python only client

This PR is a followup of #50997 that backports the test changes into branch-3.5.

To make the build pass (https://github.com/apache/spark/actions/runs/15522509263/job/43697654408).

```
======================================================================
FAIL [0.015s]: test_udtf_with_invalid_return_type (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_with_invalid_return_type)
----------------------------------------------------------------------
pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.sql.connect.common.InvalidPlanInput) [INVALID_SCHEMA_TYPE_NON_STRUCT] Invalid schema type. Expect a struct type, but got "INT". SQLSTATE: 42K09

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/connect/test_parity_udtf.py", line 57, in test_udtf_with_invalid_return_type
    with self.assertRaisesRegex(
AssertionError: "Invalid Python user-defined table function return type." does not match "(org.apache.spark.sql.connect.common.InvalidPlanInput) [INVALID_SCHEMA_TYPE_NON_STRUCT] Invalid schema type. Expect a struct type, but got "INT". SQLSTATE: 42K09"

----------------------------------------------------------------------
```

Now that the error message has changed, the tests in branch-3.5 fails.

No, test-only.

Manually.

No.

Closes #51121 from HyukjinKwon/SPARK-52420.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit c75e7c7)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
yhuang-db pushed a commit to yhuang-db/spark that referenced this pull request Jun 9, 2025
…nInput errors

### What changes were proposed in this pull request?

This PR adds NERF (New Error Framework) support for schema-related InvalidPlanInput errors in Spark Connect. The changes include:

1. Added a new error condition in error-conditions.json for schema validation:
   - INVALID_SCHEMA_TYPE_NON_STRUCT

2. Refactored error handling in InvalidInputErrors.scala to use the new NERF framework:
   - Added helper function invalidPlanInput for consistent error message generation
   - Updated schema validation error methods to use NERF error conditions
   - Made quoteByDefault method accessible to other packages

3. Added a test suite InvalidInputErrorsSuite.scala to verify error handling

### Why are the changes needed?

These changes are needed to:
1. Standardize error reporting across Spark Connect using the NERF framework
2. Improve error messages with better parameterization and consistency
3. Ensure proper SQL state codes are associated with schema-related errors
4. Provide clearer error messages for users when schema validation fails

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

`build/sbt "connect/testOnly *InvalidInputErrorsSuite"`

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 0.50.5 (Universal)

Closes apache#50997 from heyihong/SPARK-52243.

Authored-by: Yihong He <heyihong.cn@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
yhuang-db pushed a commit to yhuang-db/spark that referenced this pull request Jun 9, 2025
…compatible with Python only client

### What changes were proposed in this pull request?

This PR is a followup of apache#50997 that backports the test changes into branch-3.5.

### Why are the changes needed?

To make the build pass (https://github.com/apache/spark/actions/runs/15522509263/job/43697654408).

```
======================================================================
FAIL [0.015s]: test_udtf_with_invalid_return_type (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_with_invalid_return_type)
----------------------------------------------------------------------
pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.sql.connect.common.InvalidPlanInput) [INVALID_SCHEMA_TYPE_NON_STRUCT] Invalid schema type. Expect a struct type, but got "INT". SQLSTATE: 42K09

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/connect/test_parity_udtf.py", line 57, in test_udtf_with_invalid_return_type
    with self.assertRaisesRegex(
AssertionError: "Invalid Python user-defined table function return type." does not match "(org.apache.spark.sql.connect.common.InvalidPlanInput) [INVALID_SCHEMA_TYPE_NON_STRUCT] Invalid schema type. Expect a struct type, but got "INT". SQLSTATE: 42K09"

----------------------------------------------------------------------
```

Now that the error message has changed, the tests in branch-3.5 fails.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Manually.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#51121 from HyukjinKwon/SPARK-52420.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants