-
Notifications
You must be signed in to change notification settings - Fork 5
[SPARK-51483] Add SparkSession
and DataFrame
actors
#10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Sources/SparkConnect/DataFrame.swift
Outdated
} | ||
|
||
/// Add `Apache Arrow`'s `RecordBatch`s to the intenal array. | ||
/// - Parameter batches: A ``RecordBatch`` instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be an array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. typo. Let me fix it.
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
@Test | ||
func range() async throws { | ||
let spark = try await SparkSession.builder.getOrCreate() | ||
#expect(try await spark.range(10).count() == 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so the test already connects to real server to execute a range plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it works with the real server from now. Please see here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I mean in the current Github Actions, do we already have run Connect Server to run these tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improving CIs by enabling the Spark Connect server via Docker in GitHub Action CIs.
Ah, I see that it is in later steps. Thanks.
Thank you for helping this effort so far, @viirya . This is the last of initial implementation. After this, I'm moving forward to
|
According to the review comment, I didn't mention For now, this is supposed to support Apache Spark 4.0.0+ only. |
Thank you. For the record, the first MVP (Minimum Viable Product) is focusing on
|
Merged to main~ |
What changes were proposed in this pull request?
This PR aims to add
SparkSession
andDataFrame
actors.SparkSession.SparkContext
is defined as an emptystruct
just as a type.SparkSession.Builder
is defined to match with the builder pattern.Why are the changes needed?
To allow users to start to use this library. After this PR, we can run the test against the real
Spark Connect
servers.Does this PR introduce any user-facing change?
No, this is not released yet.
How was this patch tested?
Pass the CIs.
Was this patch authored or co-authored using generative AI tooling?
No.