-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-52243][CONNECT] Add NERF support for schema-related InvalidPlanInput errors #50997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f2d5abb
to
01fddeb
Compare
@@ -3432,6 +3444,12 @@ | |||
], | |||
"sqlState" : "42602" | |||
}, | |||
"INVALID_SCHEMA_TYPE_NON_STRUCT" : { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we use this error condition everywhere? All of those features need to specify a schema, and INVALID_SCHEMA_TYPE_NON_STRUCT
is general enough that fit all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To indicate where the error happened, error context is a better place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
thanks, merging to master! |
…compatible with Python only client ### What changes were proposed in this pull request? This PR is a followup of #50997 that backports the test changes into branch-3.5. ### Why are the changes needed? To make the build pass (https://github.com/apache/spark/actions/runs/15522509263/job/43697654408). ``` ====================================================================== FAIL [0.015s]: test_udtf_with_invalid_return_type (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_with_invalid_return_type) ---------------------------------------------------------------------- pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.sql.connect.common.InvalidPlanInput) [INVALID_SCHEMA_TYPE_NON_STRUCT] Invalid schema type. Expect a struct type, but got "INT". SQLSTATE: 42K09 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/connect/test_parity_udtf.py", line 57, in test_udtf_with_invalid_return_type with self.assertRaisesRegex( AssertionError: "Invalid Python user-defined table function return type." does not match "(org.apache.spark.sql.connect.common.InvalidPlanInput) [INVALID_SCHEMA_TYPE_NON_STRUCT] Invalid schema type. Expect a struct type, but got "INT". SQLSTATE: 42K09" ---------------------------------------------------------------------- ``` Now that the error message has changed, the tests in branch-3.5 fails. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #51121 from HyukjinKwon/SPARK-52420. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…compatible with Python only client This PR is a followup of #50997 that backports the test changes into branch-3.5. To make the build pass (https://github.com/apache/spark/actions/runs/15522509263/job/43697654408). ``` ====================================================================== FAIL [0.015s]: test_udtf_with_invalid_return_type (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_with_invalid_return_type) ---------------------------------------------------------------------- pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.sql.connect.common.InvalidPlanInput) [INVALID_SCHEMA_TYPE_NON_STRUCT] Invalid schema type. Expect a struct type, but got "INT". SQLSTATE: 42K09 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/connect/test_parity_udtf.py", line 57, in test_udtf_with_invalid_return_type with self.assertRaisesRegex( AssertionError: "Invalid Python user-defined table function return type." does not match "(org.apache.spark.sql.connect.common.InvalidPlanInput) [INVALID_SCHEMA_TYPE_NON_STRUCT] Invalid schema type. Expect a struct type, but got "INT". SQLSTATE: 42K09" ---------------------------------------------------------------------- ``` Now that the error message has changed, the tests in branch-3.5 fails. No, test-only. Manually. No. Closes #51121 from HyukjinKwon/SPARK-52420. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit c75e7c7) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…compatible with Python only client This PR is a followup of #50997 that backports the test changes into branch-3.5. To make the build pass (https://github.com/apache/spark/actions/runs/15522509263/job/43697654408). ``` ====================================================================== FAIL [0.015s]: test_udtf_with_invalid_return_type (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_with_invalid_return_type) ---------------------------------------------------------------------- pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.sql.connect.common.InvalidPlanInput) [INVALID_SCHEMA_TYPE_NON_STRUCT] Invalid schema type. Expect a struct type, but got "INT". SQLSTATE: 42K09 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/connect/test_parity_udtf.py", line 57, in test_udtf_with_invalid_return_type with self.assertRaisesRegex( AssertionError: "Invalid Python user-defined table function return type." does not match "(org.apache.spark.sql.connect.common.InvalidPlanInput) [INVALID_SCHEMA_TYPE_NON_STRUCT] Invalid schema type. Expect a struct type, but got "INT". SQLSTATE: 42K09" ---------------------------------------------------------------------- ``` Now that the error message has changed, the tests in branch-3.5 fails. No, test-only. Manually. No. Closes #51121 from HyukjinKwon/SPARK-52420. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit c75e7c7) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…nInput errors ### What changes were proposed in this pull request? This PR adds NERF (New Error Framework) support for schema-related InvalidPlanInput errors in Spark Connect. The changes include: 1. Added a new error condition in error-conditions.json for schema validation: - INVALID_SCHEMA_TYPE_NON_STRUCT 2. Refactored error handling in InvalidInputErrors.scala to use the new NERF framework: - Added helper function invalidPlanInput for consistent error message generation - Updated schema validation error methods to use NERF error conditions - Made quoteByDefault method accessible to other packages 3. Added a test suite InvalidInputErrorsSuite.scala to verify error handling ### Why are the changes needed? These changes are needed to: 1. Standardize error reporting across Spark Connect using the NERF framework 2. Improve error messages with better parameterization and consistency 3. Ensure proper SQL state codes are associated with schema-related errors 4. Provide clearer error messages for users when schema validation fails ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `build/sbt "connect/testOnly *InvalidInputErrorsSuite"` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 0.50.5 (Universal) Closes apache#50997 from heyihong/SPARK-52243. Authored-by: Yihong He <heyihong.cn@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…compatible with Python only client ### What changes were proposed in this pull request? This PR is a followup of apache#50997 that backports the test changes into branch-3.5. ### Why are the changes needed? To make the build pass (https://github.com/apache/spark/actions/runs/15522509263/job/43697654408). ``` ====================================================================== FAIL [0.015s]: test_udtf_with_invalid_return_type (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_with_invalid_return_type) ---------------------------------------------------------------------- pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.sql.connect.common.InvalidPlanInput) [INVALID_SCHEMA_TYPE_NON_STRUCT] Invalid schema type. Expect a struct type, but got "INT". SQLSTATE: 42K09 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/connect/test_parity_udtf.py", line 57, in test_udtf_with_invalid_return_type with self.assertRaisesRegex( AssertionError: "Invalid Python user-defined table function return type." does not match "(org.apache.spark.sql.connect.common.InvalidPlanInput) [INVALID_SCHEMA_TYPE_NON_STRUCT] Invalid schema type. Expect a struct type, but got "INT". SQLSTATE: 42K09" ---------------------------------------------------------------------- ``` Now that the error message has changed, the tests in branch-3.5 fails. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#51121 from HyukjinKwon/SPARK-52420. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR adds NERF (New Error Framework) support for schema-related InvalidPlanInput errors in Spark Connect. The changes include:
Added a new error condition in error-conditions.json for schema validation:
Refactored error handling in InvalidInputErrors.scala to use the new NERF framework:
Added a test suite InvalidInputErrorsSuite.scala to verify error handling
Why are the changes needed?
These changes are needed to:
Does this PR introduce any user-facing change?
No
How was this patch tested?
build/sbt "connect/testOnly *InvalidInputErrorsSuite"
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 0.50.5 (Universal)