[SPARK-53914][BUILD][CONNECT] Add connect-client-jdbc module #52619

pan3793 · 2025-10-15T06:09:18Z

What changes were proposed in this pull request?

Developer-oriented stuff:

the maven module artifactId is spark-connect-client-jdbc_2.13
the scala project name is connect-client-jdbc
the module is located at sql/connect/client/jdbc
pacakged jar goes <DIST_DIR>/jars/connect-repl/, colocated with the connect-client-jvm jar

User-facing points:

The JDBC URL reuses the current URL used by the Spark Connect client, with an additional prefix jdbc:, e.g., jdbc:sc://localhost:15002
JDBC Driver class name is: org.apache.spark.sql.connect.client.jdbc.SparkConnectDriver

Why are the changes needed?

Kick off SPIP: JDBC Driver for Spark Connect

Does this PR introduce any user-facing change?

New feature.

How was this patch tested?

UT added.

Was this patch authored or co-authored using generative AI tooling?

No.

pan3793 · 2025-10-15T07:50:05Z

...t/client/jdbc/src/main/java/org/apache/spark/sql/connect/client/jdbc/SparkConnectDriver.java

+public class SparkConnectDriver extends NonRegisteringSparkConnectDriver {
+  static {
+    try {
+      DriverManager.registerDriver(new SparkConnectDriver());


I plan to write the JDBC module in Scala, but I have to write this class in Java because there seems to be no equivalent implementation of Java static block in Scala

Java sounds good to me.

pan3793 · 2025-10-15T07:54:04Z

project/SparkBuild.scala

    ),

    (assembly / assemblyMergeStrategy) := {
+      case PathList("META-INF", "services", xs @ _*) => MergeStrategy.filterDistinctLines


take the same effect as Maven's

org.apache.maven.plugins.shade.resource.ServicesResourceTransformer

Maybe I should fix it independently?

Yes, +1 for spinning off this topic.

Opened SPARK-53935 (#52636)

pan3793 · 2025-10-15T07:55:09Z

cc @LuciferYang @hvanhovell

dongjoon-hyun · 2025-10-16T02:07:16Z

...c/main/scala/org/apache/spark/sql/connect/client/jdbc/NonRegisteringSparkConnectDriver.scala

+  override def acceptsURL(url: String): Boolean = url.startsWith("jdbc:sc://")
+
+  override def connect(url: String, info: Properties): Connection = {
+    throw new UnsupportedOperationException("TODO")


Please use IDed TODO always.

created SPARK-53934 and updated comments

dongjoon-hyun · 2025-10-16T02:12:08Z

sql/connect/client/jdbc/pom.xml

+  <name>Spark Project Connect JDBC Driver</name>
+  <url>https://spark.apache.org/</url>
+  <properties>
+    <sbt.project.name>connect-jdbc</sbt.project.name>


According to the directory structure, sql/connect/client/jdbc, connect-client-jdbc might be correct like connect-client-jvm in the same level.

I agree consistency is more important, renamed it to connect-client-jdbc

dongjoon-hyun

Thank you. This looks like a good start, @pan3793 . I left a few minor comments, but we can revisit them later. I believe we need more time to get broader reviews from multiple committers for this PR.

pan3793 · 2025-10-16T06:54:03Z

I believe we need more time to get broader reviews from multiple committers for this PR.

@dongjoon-hyun yeah, I understand, since this introduces user-facing changes.

cc @martin-g @zhengruifeng @sarutak @wangyum @peter-toth

peter-toth · 2025-10-16T08:19:29Z

Thanks @pan3793 for working on this! This PR looks like a good start PR.

project/SparkBuild.scala

LuciferYang · 2025-10-16T08:22:48Z

dev/sparktestsupport/modules.py

    sbt_test_goals=[
        "connect/test",
        "connect-client-jvm/test",
+        "connect-client-jdbc/test",


Please confirm whether maven_test.yml needs to be modified.

thanks for the tip, it indeed needs to be updated.

LuciferYang · 2025-10-16T08:28:00Z

...c/main/scala/org/apache/spark/sql/connect/client/jdbc/NonRegisteringSparkConnectDriver.scala

+import org.apache.spark.SparkBuildInfo.{spark_version => SPARK_VERSION}
+import org.apache.spark.util.VersionUtils
+
+class NonRegisteringSparkConnectDriver extends Driver {


I'm not sure whether we should implement everything using Java.

+1 using Java for new modules

@LuciferYang I would prefer to only write the public API in Java, similar to what we did for DSv2.

Let me share my thoughts:

the proposed implementation is based on the existing connect-client-jvm, which is written in Scala, that means users must have scala-runtime on the classpath to use this JDBC driver eventually.

I agree we should define the public API in Java Interface or Class. In this case, the public JDBC APIs(e.g. java.sql.Connection, Statement, ResultSet, etc.) are defined in JDK, and I would not treat the concrete implementation classes as the public API, so it does not matter to write it in Java or Scala.

I can imagine a few sets of additional public APIs exposed by this JDBC driver, e.g. VariantVal, CalendarInterval, that will be returned by Object getObject(int columnIndex), those classes are written in Java.

github-actions bot added SQL BUILD CONNECT labels Oct 15, 2025

pan3793 commented Oct 15, 2025

View reviewed changes

dongjoon-hyun reviewed Oct 16, 2025

View reviewed changes

pan3793 force-pushed the SPARK-53914 branch from 703921b to b00367b Compare October 16, 2025 06:40

pan3793 changed the title ~~[SPARK-53914][BUILD][CONNECT] Add connect-jdbc module~~ [SPARK-53914][BUILD][CONNECT] Add connect-client-jdbc module Oct 16, 2025

LuciferYang reviewed Oct 16, 2025

View reviewed changes

github-actions bot added the INFRA label Oct 16, 2025

pan3793 added 3 commits October 16, 2025 17:47

[SPARK-53914][BUILD][CONNECT] Add connect-jdbc module

5d3b84c

address comments

cd514ff

update maven_test.yml to inlcude connect client jdbc test

d058f4b

pan3793 force-pushed the SPARK-53914 branch from d03d986 to d058f4b Compare October 16, 2025 10:31

simplify

332dc3e

[SPARK-53914][BUILD][CONNECT] Add connect-client-jdbc module #52619

Are you sure you want to change the base?

[SPARK-53914][BUILD][CONNECT] Add connect-client-jdbc module #52619

Conversation

pan3793 commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

pan3793 Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pan3793 commented Oct 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

pan3793 commented Oct 16, 2025

Uh oh!

peter-toth commented Oct 16, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pan3793 commented Oct 15, 2025 •

edited

Loading

pan3793 Oct 15, 2025 •

edited

Loading