diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index d822511..fd9f957 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -160,3 +160,17 @@ jobs: cd - swift test --filter DataFrameWriterV2Tests -c release swift test --filter IcebergTest -c release + + linter: + runs-on: ubuntu-latest + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 0 + - name: Super-Linter + uses: super-linter/super-linter@12150456a73e248bdc94d0794898f94e23127c88 + env: + DEFAULT_BRANCH: main + VALIDATE_MARKDOWN: true + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} diff --git a/.markdownlint.yaml b/.markdownlint.yaml new file mode 100644 index 0000000..11c7a48 --- /dev/null +++ b/.markdownlint.yaml @@ -0,0 +1,18 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +MD013: false diff --git a/.markdownlintignore b/.markdownlintignore new file mode 100644 index 0000000..320d663 --- /dev/null +++ b/.markdownlintignore @@ -0,0 +1,18 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +Sources/SparkConnect/Documentation.docc/Examples.md diff --git a/Examples/app/README.md b/Examples/app/README.md index fd98481..1a4b82f 100644 --- a/Examples/app/README.md +++ b/Examples/app/README.md @@ -6,13 +6,13 @@ This is an example Swift application to show how to use Apache Spark Connect Swi Prepare `Spark Connect Server` via running Docker image. -``` +```bash docker run --rm -p 15002:15002 apache/spark:4.0.0 bash -c "/opt/spark/sbin/start-connect-server.sh --wait" ``` Build an application Docker image. -``` +```bash $ docker build -t apache/spark-connect-swift:app . $ docker images apache/spark-connect-swift:app REPOSITORY TAG IMAGE ID CREATED SIZE @@ -21,7 +21,7 @@ apache/spark-connect-swift app e132e1b38348 5 seconds ago 368MB Run `app` docker image. -``` +```bash $ docker run --rm -e SPARK_REMOTE=sc://host.docker.internal:15002 apache/spark-connect-swift:app Connected to Apache Spark 4.0.0 Server EXECUTE: DROP TABLE IF EXISTS t @@ -49,7 +49,7 @@ SELECT * FROM t Run from source code. -``` +```bash $ swift run ... Connected to Apache Spark 4.0.0 Server diff --git a/Examples/pi/README.md b/Examples/pi/README.md index 88d3bf9..d1e203e 100644 --- a/Examples/pi/README.md +++ b/Examples/pi/README.md @@ -5,13 +5,14 @@ This is an example Swift application to show how to use Apache Spark Connect Swi ## How to run Prepare `Spark Connect Server` via running Docker image. -``` + +```bash docker run --rm -p 15002:15002 apache/spark:4.0.0 bash -c "/opt/spark/sbin/start-connect-server.sh --wait" ``` Build an application Docker image. -``` +```bash $ docker build -t apache/spark-connect-swift:pi . $ docker images apache/spark-connect-swift:pi REPOSITORY TAG IMAGE ID CREATED SIZE @@ -20,14 +21,14 @@ apache/spark-connect-swift pi d03952577564 4 seconds ago 369MB Run `pi` docker image. -``` +```bash $ docker run --rm -e SPARK_REMOTE=sc://host.docker.internal:15002 apache/spark-connect-swift:pi Pi is roughly 3.1412831412831412 ``` Run from source code. -``` +```bash $ swift run ... Pi is roughly 3.1423711423711422 diff --git a/Examples/spark-sql/README.md b/Examples/spark-sql/README.md index 5457ba4..015079d 100644 --- a/Examples/spark-sql/README.md +++ b/Examples/spark-sql/README.md @@ -6,13 +6,13 @@ This is an example Swift application to show how to develop a Spark SQL REPL(Rea Prepare `Spark Connect Server` via running Docker image. -``` +```bash docker run -it --rm -p 15002:15002 apache/spark:4.0.0 bash -c "/opt/spark/sbin/start-connect-server.sh --wait" ``` Build an application Docker image. -``` +```bash $ docker build -t apache/spark-connect-swift:spark-sql . $ docker images apache/spark-connect-swift:spark-sql REPOSITORY TAG IMAGE ID CREATED SIZE @@ -21,7 +21,7 @@ apache/spark-connect-swift spark-sql 265ddfec650d 7 seconds ago 390MB Run `spark-sql` docker image. -``` +```bash $ docker run -it --rm -e SPARK_REMOTE=sc://host.docker.internal:15002 apache/spark-connect-swift:spark-sql Connected to Apache Spark 4.0.0 Server spark-sql (default)> SHOW DATABASES; @@ -87,7 +87,7 @@ spark-sql (default)> exit; Apache Spark 4 supports [SQL Pipe Syntax](https://spark.apache.org/docs/4.0.0/sql-pipe-syntax.html). -``` +```bash $ swift run ... Build of product 'SparkSQLRepl' complete! (2.33s) @@ -110,7 +110,7 @@ Time taken: 159 ms Run from source code. -``` +```bash $ swift run ... Connected to Apache Spark 4.0.0 Server diff --git a/Examples/stream/README.md b/Examples/stream/README.md index 2924358..f4eb2fc 100644 --- a/Examples/stream/README.md +++ b/Examples/stream/README.md @@ -20,7 +20,7 @@ nc -lk 9999 Build an application Docker image. -``` +```bash $ docker build -t apache/spark-connect-swift:stream . $ docker images apache/spark-connect-swift:stream REPOSITORY TAG IMAGE ID CREATED SIZE @@ -29,8 +29,8 @@ apache/spark-connect-swift stream a4daa10ad9c5 7 seconds ago 369MB Run `stream` docker image. -``` -$ docker run --rm -e SPARK_REMOTE=sc://host.docker.internal:15002 -e TARGET_HOST=host.docker.internal apache/spark-connect-swift:stream +```bash +docker run --rm -e SPARK_REMOTE=sc://host.docker.internal:15002 -e TARGET_HOST=host.docker.internal apache/spark-connect-swift:stream ``` ## Send input and check output diff --git a/Examples/web/README.md b/Examples/web/README.md index 788b504..5819589 100644 --- a/Examples/web/README.md +++ b/Examples/web/README.md @@ -2,18 +2,18 @@ This project is designed to illustrate a Swift-based HTTP WebServer with Apache Spark Connect. -- https://swiftpackageindex.com/vapor/vapor +- ## Create a Swift project -``` +```bash brew install vapor vapor new spark-connect-swift-web -n ``` -## Use `Apache Spark Connect Swift Client` package. +## Use `Apache Spark Connect Swift Client` package -``` +```bash $ git diff HEAD diff --git a/Package.swift b/Package.swift index 477bcbd..3e7bb06 100644 @@ -76,13 +76,13 @@ index 2edcc8f..22313c8 100644 Prepare `Spark Connect Server` via running Docker image. -``` +```bash docker run --rm -p 15002:15002 apache/spark:4.0.0 bash -c "/opt/spark/sbin/start-connect-server.sh --wait" ``` Build an application Docker image. -``` +```bash $ docker build -t apache/spark-connect-swift:web . $ docker images apache/spark-connect-swift:web REPOSITORY TAG IMAGE ID CREATED SIZE @@ -91,14 +91,14 @@ apache/spark-connect-swift web 3fd2422fdbee 27 seconds ago 417MB Run `web` docker image -``` +```bash $ docker run -it --rm -p 8080:8080 -e SPARK_REMOTE=sc://host.docker.internal:15002 apache/spark-connect-swift:web [ NOTICE ] Server started on http://127.0.0.1:8080 ``` Connect to the Swift Web Server to talk with `Apache Spark`. -``` +```bash $ curl http://127.0.0.1:8080/ Welcome to the Swift world. Say hello!% @@ -108,6 +108,6 @@ Hi, this is powered by the Apache Spark 4.0.0.% Run from source code. -``` -$ swift run +```bash +swift run ``` diff --git a/README.md b/README.md index 6768a5b..4602c4f 100644 --- a/README.md +++ b/README.md @@ -117,4 +117,3 @@ SELECT * FROM t You can find more complete examples including `Spark SQL REPL`, `Web Server` and `Streaming` applications in the [Examples](https://github.com/apache/spark-connect-swift/tree/main/Examples) directory. This library also supports `SPARK_REMOTE` environment variable to specify the [Spark Connect connection string](https://spark.apache.org/docs/latest/spark-connect-overview.html#set-sparkremote-environment-variable) in order to provide more options. - diff --git a/Sources/SparkConnect/Documentation.docc/Examples.md b/Sources/SparkConnect/Documentation.docc/Examples.md index 864e702..5858d18 100644 --- a/Sources/SparkConnect/Documentation.docc/Examples.md +++ b/Sources/SparkConnect/Documentation.docc/Examples.md @@ -13,12 +13,14 @@ docker run -it --rm -p 15002:15002 apache/spark:4.0.0 bash -c "/opt/spark/sbin/s ## Basic Application Example The basic application example demonstrates fundamental operations with Apache Spark Connect, including: + - Connecting to a Spark server - Creating and manipulating tables with SQL - Using DataFrame operations - Reading and writing data in the ORC format ### Key Features + - SQL execution for table operations - DataFrame transformations with filter operations - Data persistence with ORC format @@ -40,11 +42,13 @@ swift run ## Spark SQL REPL(Read-Eval-Print Loop) Example The Spark SQL REPL application example demonstrates interactive operations with ad-hoc Spark SQL queries with Apache Spark Connect, including: + - Connecting to a Spark server - Receiving ad-hoc Spark SQL queries from users - Show the SQL results interactively ### Key Features + - Spark SQL execution for table operations - User interactions @@ -66,6 +70,7 @@ swift run The Pi calculation example shows how to use Spark Connect Swift for computational tasks by calculating an approximation of π (pi) using the Monte Carlo method. ### Key Features + - Command-line argument handling - Mathematical computations with Spark - Random number generation @@ -89,6 +94,7 @@ swift run The streaming example demonstrates how to process streaming data using Spark Connect Swift client, specifically for counting words from a network socket stream. ### Key Features + - Stream processing with Spark Connect - Network socket data source - Word counting with string operations @@ -120,6 +126,7 @@ Type text into the Netcat terminal to see real-time word counting from `Spark Co The web application example showcases how to integrate Spark Connect Swift with a web server using the Vapor framework. ### Key Features + - HTTP server integration with Vapor - REST API endpoints - Spark session management within web requests @@ -153,6 +160,7 @@ Hi, this is powered by the Apache Spark 4.0.0.% ## Development Environment All examples include: + - A Dockerfile for containerized execution - A Package.swift file for Swift Package Manager configuration - A README.md with detailed instructions