[SPARK-53882][CONNECT][DOCS] Add documentation comparing behavioral differences between Spark Connect and Spark Classic #52585

xi-db · 2025-10-13T08:35:01Z

What changes were proposed in this pull request?

Spark Connect is a decoupled client-server architecture that allows remote connectivity to Spark clusters using the DataFrame API and unresolved logical plans as the protocol, which is well documented in https://spark.apache.org/docs/latest/spark-connect-overview.html.

However, there is a lack of guidance to help users understand the behavioral differences between Spark Classic and Spark Connect and to avoid unexpected behavior.

In this PR, a document is added that details the behavioral differences between Spark Connect and Spark Classic, lazy schema analysis and name resolution, and their implications.

Why are the changes needed?

This doc helps users migrating from Spark Classic to Spark Connect to understand the behavioral differences.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

N/A.

Was this patch authored or co-authored using generative AI tooling?

No.

hvanhovell · 2025-10-13T19:17:19Z

docs/spark-connect-gotchas.md

+  limitations under the License.
+---
+
+The comparison highlights key differences between Spark Connect and Spark Classic in terms of execution and analysis behavior. While both utilize lazy execution for transformations, Spark Connect emphasizes deferred schema analysis, introducing unique considerations like temporary view handling and UDF evaluation. The guide outlines common gotchas and provides strategies for mitigation.


, Spark Connect emphasizes deferred schema analysis -> , Spark Connect also defers analysis/, Spark Connect analyzes lazily

try to avoid too much indirection.

Yes, done, and updated several other places as well.

hvanhovell · 2025-10-13T19:18:16Z

docs/spark-connect-gotchas.md

+
+**When does this matter?** These differences are particularly important when migrating existing code from Spark Classic to Spark Connect, or when writing code that needs to work with both modes. Understanding these distinctions helps avoid unexpected behavior and performance issues.
+
+**Note:** The examples in this guide use Python, but the same principles apply to Scala and Java.


Please be a champ and also add Scala/Java

Good point, I've just added Scala examples.

xi-db added 5 commits October 13, 2025 07:29

Add doc

9b61e87

Fix typo, minor change

77d08dd

Update section titles

80c5af1

Improve readablility

598a91e

Add link in PySpark reference doc

def2b26

github-actions bot added DOCS PYTHON labels Oct 13, 2025

xi-db changed the title ~~Add documentation comparing behavioral differences between Spark Connect and Spark Classic~~ [SPARK-53882][CONNECT][DOC] Add documentation comparing behavioral differences between Spark Connect and Spark Classic Oct 13, 2025

xi-db changed the title ~~[SPARK-53882][CONNECT][DOC] Add documentation comparing behavioral differences between Spark Connect and Spark Classic~~ [SPARK-53882][CONNECT][DOCS] Add documentation comparing behavioral differences between Spark Connect and Spark Classic Oct 13, 2025

hvanhovell reviewed Oct 13, 2025

View reviewed changes

xi-db added 6 commits October 14, 2025 07:44

Avoid too much indirection

4d1900a

Avoid too much indirection

a30eaf2

Add scala examples

72fcb0d

Escape temp view name

cc4027f

Update examples

992498f

Update comments in examples

9dd11bf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-53882][CONNECT][DOCS] Add documentation comparing behavioral differences between Spark Connect and Spark Classic #52585

[SPARK-53882][CONNECT][DOCS] Add documentation comparing behavioral differences between Spark Connect and Spark Classic #52585

xi-db commented Oct 13, 2025

Uh oh!

hvanhovell Oct 13, 2025

Uh oh!

xi-db Oct 14, 2025

Uh oh!

hvanhovell Oct 13, 2025

Uh oh!

xi-db Oct 14, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		When does this matter? These differences are particularly important when migrating existing code from Spark Classic to Spark Connect, or when writing code that needs to work with both modes. Understanding these distinctions helps avoid unexpected behavior and performance issues.

		Note: The examples in this guide use Python, but the same principles apply to Scala and Java.

[SPARK-53882][CONNECT][DOCS] Add documentation comparing behavioral differences between Spark Connect and Spark Classic #52585

Are you sure you want to change the base?

[SPARK-53882][CONNECT][DOCS] Add documentation comparing behavioral differences between Spark Connect and Spark Classic #52585

Conversation

xi-db commented Oct 13, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

hvanhovell Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

xi-db Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

hvanhovell Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

xi-db Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xi-db Oct 14, 2025 •

edited

Loading