Skip to content

[SPARK-51472] Add gRPC SparkConnectClient actor #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

[SPARK-51472] Add gRPC SparkConnectClient actor #7

wants to merge 6 commits into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Mar 11, 2025

What changes were proposed in this pull request?

This PR aims to add a Swift SparkConnectClient actor encapsulating gRPC connections which is similar to other language clients.

  • Swift (this PR)
public actor SparkConnectClient {
private[sql] class SparkConnectClient(
class SparkConnectClient(object):

This is a part of the following.

Why are the changes needed?

To use gRPC in the upper SparkSession and DataFrame layers easily.

Does this PR introduce any user-facing change?

No, this is not released.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-51472] Add gRPC Client actor [SPARK-51472] Add gRPC SparkConnectClient actor Mar 11, 2025
@dongjoon-hyun
Copy link
Member Author

Could you review this PR when you have some time, @HyukjinKwon ?

@dongjoon-hyun
Copy link
Member Author

Could you review this too when you have some time, @viirya , please?

@viirya
Copy link
Member

viirya commented Mar 12, 2025

Looking into this.

@dongjoon-hyun
Copy link
Member Author

Thank you so much, @viirya !

Comment on lines +38 to +43
init(remote: String, user: String) {
self.url = URL(string: remote)!
self.host = url.host() ?? "localhost"
self.port = self.url.port ?? 15002
self.userContext = user.toUserContext
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Scala Configuration, there are a lot configs other than remote and user, e.g., userAgent, isSslEnabled, etc. Are you going to add more configurations later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, this project aims to achieve the feature parity eventually. But, each language clients have different development speeds like we did in SparkR/PySpark/Scala Spark. For Spark Connect example, Spark Connect Go doesn't support isSslEnabled.

Comment on lines 68 to 74
var request = AnalyzePlanRequest()
request.clientType = clientType
request.userContext = userContext
request.sessionID = sessionID
request.analyze = .sparkVersion(version)
let response = try await service.analyzePlan(request)
return response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this AnalyzePlanRequest only used for test for now?

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in this PR. #1 has SparkSession and DataFrame implementation and I'll make PRs soon after this.

For example, this is the implementation for SparkSession.version API.

/// Request the server to set a map of configurations for this session.
/// - Parameter map: A map of key-value pairs to set.
/// - Returns: Always return true.
func setConf(map: [String: String]) async throws -> Bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why don't return ConfigResponse like Scala but a Bool?

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a different implementation choice to encapsulate those details.

Technically, each language client maintains its SparkConnectClient in the private scope like private[sql].

var request = ExecutePlanRequest()
request.clientType = clientType
request.userContext = userContext
request.sessionID = sessionID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be self.sessionID?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, @viirya . Let me fix this.

/// - sessionID: A string for the existing session ID.
/// - plan: A plan to analyze.
/// - Returns: An ``AnalyzePlanRequest`` instance
func getAnalyzePlanRequest(_ sessionID: String, _ plan: Plan) async
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getAnalyzePlanRequest and getExecutePlanRequest are not used in this PR, right? Will they be used in later PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's going to used in SparkSession and DataFrame in the next PRs.

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments. Looks good for the initial import of SparkConnectClient.

dongjoon-hyun and others added 3 commits March 11, 2025 20:03
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
@dongjoon-hyun
Copy link
Member Author

Thank you so much for your review, Liang-Chi! Merged to main.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-51472 branch March 12, 2025 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants