Skip to content

[SPARK-51461] Setup SparkConnect Swift package structure and CI to test build #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Mar 11, 2025

What changes were proposed in this pull request?

This PR aims to setup SparkConnect Swift package structure and CI to test build.

Note that this is a subset of the initial implementation.

Why are the changes needed?

To setup the initial package structure with CI build test coverage before adding the actual code. Currently, the following two OSs are tested.

  • MacOS 15
  • Ubuntu 24.04

According to the standard Swift package structure,

this PR adds the following structure for SparkConnect package. SparkConnectError.swift and BuilderTests.swift is added in order to fill the empty directories.

$ tree .
.
├── dev
│   └── merge_spark_pr.py
├── LICENSE
├── Package.swift
├── README.md
├── Sources
│   └── SparkConnect
│       └── SparkConnectError.swift
└── Tests
    └── SparkConnectTests
        └── BuilderTests.swift

Does this PR introduce any user-facing change?

No. This is not released yet.

How was this patch tested?

Pass the CI.

Was this patch authored or co-authored using generative AI tooling?

No.

@@ -14,5 +14,6 @@ header:
- 'NOTICE'
- '.asf.yaml'
- '.nojekyll'
- 'Package.swift'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has the required ASF license header. However, the first line of this file should be swift-tools-version: 6.0. So, I added here inevitably.

@@ -199,68 +199,3 @@
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of this file is copied from Apache Spark repository and is invalid in this Swift repository.

@dongjoon-hyun
Copy link
Member Author

Could you review and help this bootstrapping PR when you have some time, @yaooqinn ?

@@ -0,0 +1,66 @@
# Xcode
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following is a standard Git ignore pattern for Xcode IDE.

@@ -199,68 +199,3 @@
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also create a NOTICE file, here is an example https://www.apache.org/licenses/example-NOTICE.txt

If we have required third-party notices or licenses, we shall also record those portions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, Thank you, @yaooqinn .

So far, this library project is tracking the upstream changes like the [Apache Spark](https://spark.apache.org) 4.0.0 RC2 release and [Apache Arrow](https://arrow.apache.org) project's Swift-support.

## Requirement
- [Apache Spark 4.0.0 RC2 (March 2025)](https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc2-bin/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, let me remove this section.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Mar 11, 2025

Just FYI, this needs at least 4.0.0-rc2 because Spark Connect of 4.0.0-preview2 is insufficient like the following, @yaooqinn .

$ git checkout v4.0.0-rc2
$ git diff v4.0.0-preview2 sql/connect/common/src/main/protobuf/ | wc -l
     672


[![GitHub Actions Build](https://github.com/apache/spark-connect-swift/actions/workflows/build_and_test.yml/badge.svg)](https://github.com/apache/spark-connect-swift/blob/main/.github/workflows/build_and_test.yml)

This is an experimental Swift library to show how to connect to a remote Apache Spark Connect Server and run SQL statements to manipulate remote data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the project is still in an experimental phase, it would be good to add a DISCLAIMER file

Apache Spark Connect Client for Swift is an effort undergoing incubation at The Apache
Software Foundation (ASF), sponsored by the Apache Spark PMC. Incubation is required of
all newly accepted projects until a further review indicates that the infrastructure,
communications, and decision making process have stabilized in a manner consistent with
other successful ASF projects. While incubation status is not necessarily a reflection
of the completeness or stability of the code, it does indicate that the project has yet
to be fully endorsed by the ASF.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, do we have it in our sister repository, Spark Connect Go?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid not, Apache Spark PMC doesn't quite follow the ASF podling incubation process for subprojects

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially, the IP clearance part for huge donations :)

https://incubator.apache.org/ip-clearance/

@yaooqinn
Copy link
Member

Just FYI, this needs at least 4.0.0-rc2 because Spark Connect of 4.0.0-preview2 is insufficient like the following, @yaooqinn .

$ git checkout v4.0.0-rc2
$ git diff v4.0.0-preview2 sql/connect/common/src/main/protobuf/ | wc -l
     672

Thank you for the explanation.

Copy link
Member

@yaooqinn yaooqinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Mar 11, 2025

Thank you. For Incubating status, I got what you pointed. For this word, Experimental, I copied it from Spark Connect Go README. However, I'm planning to release the initial version with Apache Spark 4.1.0 timeframe after the QA. Although it doesn't guarantee the full feature parity with PySpark Connect, I believe we can consider the concrete subset as a product of TLP, instead of Incubating.

I'll reevaluate at Spark 4.1 RC period, and mark it Incubation if it's unstable. Thank you for your thoughtful advice!

@dongjoon-hyun
Copy link
Member Author

Thank you for your time again, @yaooqinn ! 🙇🏻

- [gRPC Swift 2.1 (March 2025)](https://github.com/grpc/grpc-swift/releases/tag/2.1.0)
- [gRPC Swift Protobuf 1.0 (March 2025)](https://github.com/grpc/grpc-swift-protobuf/releases/tag/1.1.0)
- [gRPC Swift NIO Transport 1.0 (March 2025)](https://github.com/grpc/grpc-swift-nio-transport/releases/tag/1.0.1)
- [Apache Arrow Swift](https://github.com/apache/arrow/tree/main/swift)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see Arrow in the dependencies, is it a transitive dependency?

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, Apache Arrow didn't make a Swift release and didn't compile it successfully on the latest Swift 6.0 version.

So, I borrowed and edited some files from Apache Arrow like Apache Spark did for Apache Hive Thrift Server module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be replaced with the official Apache Arrow Swift artifacts when they start to release~

@dongjoon-hyun
Copy link
Member Author

Thank you for review, too, @viirya !

@dongjoon-hyun
Copy link
Member Author

Merged to main~

@dongjoon-hyun dongjoon-hyun deleted the SPARK-51461 branch March 11, 2025 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants