-
Notifications
You must be signed in to change notification settings - Fork 5
[SPARK-51472] Add gRPC SparkConnectClient
actor
#7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Client
actorSparkConnectClient
actor
Could you review this PR when you have some time, @HyukjinKwon ? |
Could you review this too when you have some time, @viirya , please? |
Looking into this. |
Thank you so much, @viirya ! |
init(remote: String, user: String) { | ||
self.url = URL(string: remote)! | ||
self.host = url.host() ?? "localhost" | ||
self.port = self.url.port ?? 15002 | ||
self.userContext = user.toUserContext | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Scala Configuration
, there are a lot configs other than remote and user, e.g., userAgent
, isSslEnabled
, etc. Are you going to add more configurations later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, this project aims to achieve the feature parity eventually. But, each language clients have different development speeds like we did in SparkR/PySpark/Scala Spark. For Spark Connect
example, Spark Connect Go
doesn't support isSslEnabled
.
var request = AnalyzePlanRequest() | ||
request.clientType = clientType | ||
request.userContext = userContext | ||
request.sessionID = sessionID | ||
request.analyze = .sparkVersion(version) | ||
let response = try await service.analyzePlan(request) | ||
return response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this AnalyzePlanRequest
only used for test for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes in this PR. #1 has SparkSession
and DataFrame
implementation and I'll make PRs soon after this.
For example, this is the implementation for SparkSession.version
API.
/// Request the server to set a map of configurations for this session. | ||
/// - Parameter map: A map of key-value pairs to set. | ||
/// - Returns: Always return true. | ||
func setConf(map: [String: String]) async throws -> Bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why don't return ConfigResponse
like Scala but a Bool
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a different implementation choice to encapsulate those details.
Technically, each language client maintains its SparkConnectClient
in the private
scope like private[sql]
.
var request = ExecutePlanRequest() | ||
request.clientType = clientType | ||
request.userContext = userContext | ||
request.sessionID = sessionID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be self.sessionID
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, @viirya . Let me fix this.
/// - sessionID: A string for the existing session ID. | ||
/// - plan: A plan to analyze. | ||
/// - Returns: An ``AnalyzePlanRequest`` instance | ||
func getAnalyzePlanRequest(_ sessionID: String, _ plan: Plan) async |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getAnalyzePlanRequest
and getExecutePlanRequest
are not used in this PR, right? Will they be used in later PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's going to used in SparkSession
and DataFrame
in the next PRs.
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments. Looks good for the initial import of SparkConnectClient
.
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Thank you so much for your review, Liang-Chi! Merged to main. |
What changes were proposed in this pull request?
This PR aims to add a Swift
SparkConnectClient
actor encapsulatinggRPC
connections which is similar to other language clients.This is a part of the following.
Why are the changes needed?
To use
gRPC
in the upperSparkSession
andDataFrame
layers easily.Does this PR introduce any user-facing change?
No, this is not released.
How was this patch tested?
Pass the CIs.
Was this patch authored or co-authored using generative AI tooling?
No.