Skip to content

[SPARK-51465] Use Apache Arrow Swift 19.0.1 #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

[SPARK-51465] Use Apache Arrow Swift 19.0.1 #6

wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Mar 11, 2025

What changes were proposed in this pull request?

This PR aims to use Apache Arrow Swift 19.0.1. This will be replaced as a dependency when Apache Arrow Swift package is released later.

This is a part of the initial implementation.

Why are the changes needed?

Apache Arrow 19.0.1 is the latest version.

Although Apache Arrow 19.0.1 has Swift source code,

  • For Arrow package, we need to change two places to compile in Swift 6.0 and we need to exclude ArrowCExporter.swift and ArrowCImporter.swift
$ git clone -b apache-arrow-19.0.1 https://github.com/apache/arrow.git
$ cd arrow/swift/Arrow/Sources/Arrow/
$ rm ArrowC*
$ cp * ~/spark-connect-swift/Sources/SparkConnect/
- public enum ArrowTypeId {
+ public enum ArrowTypeId: Sendable {
- public enum Info {
+ public enum Info: Sendable {
  • For ArrowFlight package, we need to update two places to compile in Swift 6 and use only three files.
    • Flight.pb.swift
    • FlightData.swift
    • FlightDescriptor.swift
$ git clone -b apache-arrow-19.0.1 https://github.com/apache/arrow.git
$ cd arrow/swift/ArrowFlight/Sources/ArrowFlight
$ cp Flight.pb.swift FlightData.swift FlightDescriptor.swift ~/spark-connect-swift/Sources/SparkConnect
-  static var allCases: [Arrow_Flight_Protocol_CancelStatus] = [
+  static let allCases: [Arrow_Flight_Protocol_CancelStatus] = [
-  static var allCases: [Arrow_Flight_Protocol_FlightDescriptor.DescriptorType] = [
+  static let allCases: [Arrow_Flight_Protocol_FlightDescriptor.DescriptorType] = [

Lastly, swift format is applied.

$ swift format -i *.swift

Does this PR introduce any user-facing change?

No, this is not released yet.

How was this patch tested?

Pass the CI.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

Could you review this PR to use Apache Arrow Swift source code in Apache Spark Connect for Swift when you have some time, @MaxGekk ? This is a temporal usage until Apache Arrow publishes Swift package.

@dongjoon-hyun
Copy link
Member Author

Could you review this PR when you have some time, @huaxingao ?

let byteOffset = self.arrowData.stride * Int(index)
let milliseconds = self.arrowData.buffers[1].rawPointer.advanced(by: byteOffset).load(
as: UInt32.self)
return Date(timeIntervalSince1970: TimeInterval(milliseconds * 86400))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confuses slightly, you multiple 86400 = 24 * 60 * 60 which is seconds in a day to milliseconds. IMHO, the name should seconds or (24 * 60 * 60 * milliseconds) / 1000

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reivew, @MaxGekk .

Of course, we will report back to the upstream to be sync with them. I guess we need to help Apache Arrow Swift community because they didn't make a release yet . I expect more instances like this. We don't know if we don't try to use this.

https://github.com/apache/arrow/blob/9df280bc974a7070176e4466b599e36061f28887/swift/Arrow/Sources/Arrow/ArrowArray.swift#L217

For now, this PR is only a stepping stone to use Apache Arrow Swift, @MaxGekk , and make a Spark Connect client framework work for Swift users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the PR description, this PR is an import of Apache Arrow code with the minimal compilation fixes.

@viirya
Copy link
Member

viirya commented Mar 11, 2025

For the changes, do you plan to propose them to Arrow Swift repo?

- public enum ArrowTypeId {
+ public enum ArrowTypeId: Sendable {

- public enum Info {
+ public enum Info: Sendable {

we need to exclude ArrowCExporter.swift and ArrowCImporter.swift

Is it also because they cannot be compiled in Swift 6.0?

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @viirya . Yes, it's required when Apache Arrow starts to support Swift 6.0.

@dongjoon-hyun
Copy link
Member Author

Just to give the reviewers the background, Swift 6 compiler becomes more like Rust compiler in terms of Concurrency and Data Safety check.

@dongjoon-hyun
Copy link
Member Author

After building the initial working code, we are going to entering QA period by adding more unit test coverages. And, all issues are going to the upstreams because we don't want to keep the clone of Apache Arrow Swift.

Copy link

@huaxingao huaxingao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dongjoon-hyun
Copy link
Member Author

Thank you so much all for being interested in this new codebase and taking a look at this to help, @MaxGekk , @viirya and @huaxingao .

Merged to main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants