Skip to content

Conversation

pan3793
Copy link
Member

@pan3793 pan3793 commented Oct 15, 2025

What changes were proposed in this pull request?

Developer-oriented stuff:

  • the maven module artifactId is spark-connect-client-jdbc_2.13
  • the scala project name is connect-client-jdbc
  • the module is located at sql/connect/client/jdbc
  • pacakged jar goes <DIST_DIR>/jars/connect-repl/, colocated with the connect-client-jvm jar

User-facing points:

  • The JDBC URL reuses the current URL used by the Spark Connect client, with an additional prefix jdbc:, e.g., jdbc:sc://localhost:15002

  • JDBC Driver class name is: org.apache.spark.sql.connect.client.jdbc.SparkConnectDriver

Why are the changes needed?

Kick off SPIP: JDBC Driver for Spark Connect

Does this PR introduce any user-facing change?

New feature.

How was this patch tested?

UT added.

Was this patch authored or co-authored using generative AI tooling?

No.

public class SparkConnectDriver extends NonRegisteringSparkConnectDriver {
static {
try {
DriverManager.registerDriver(new SparkConnectDriver());
Copy link
Member Author

@pan3793 pan3793 Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to write the JDBC module in Scala, but I have to write this class in Java because there seems to be no equivalent implementation of Java static block in Scala

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java sounds good to me.

),

(assembly / assemblyMergeStrategy) := {
case PathList("META-INF", "services", xs @ _*) => MergeStrategy.filterDistinctLines
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take the same effect as Maven's

org.apache.maven.plugins.shade.resource.ServicesResourceTransformer

Maybe I should fix it independently?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, +1 for spinning off this topic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened SPARK-53935 (#52636)

@pan3793
Copy link
Member Author

pan3793 commented Oct 15, 2025

cc @LuciferYang @hvanhovell

override def acceptsURL(url: String): Boolean = url.startsWith("jdbc:sc://")

override def connect(url: String, info: Properties): Connection = {
throw new UnsupportedOperationException("TODO")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use IDed TODO always.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created SPARK-53934 and updated comments

<name>Spark Project Connect JDBC Driver</name>
<url>https://spark.apache.org/</url>
<properties>
<sbt.project.name>connect-jdbc</sbt.project.name>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the directory structure, sql/connect/client/jdbc, connect-client-jdbc might be correct like connect-client-jvm in the same level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree consistency is more important, renamed it to connect-client-jdbc

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This looks like a good start, @pan3793 . I left a few minor comments, but we can revisit them later. I believe we need more time to get broader reviews from multiple committers for this PR.

@pan3793
Copy link
Member Author

pan3793 commented Oct 16, 2025

I believe we need more time to get broader reviews from multiple committers for this PR.

@dongjoon-hyun yeah, I understand, since this introduces user-facing changes.

cc @martin-g @zhengruifeng @sarutak @wangyum @peter-toth

@pan3793 pan3793 changed the title [SPARK-53914][BUILD][CONNECT] Add connect-jdbc module [SPARK-53914][BUILD][CONNECT] Add connect-client-jdbc module Oct 16, 2025
@peter-toth
Copy link
Contributor

Thanks @pan3793 for working on this! This PR looks like a good start PR.

sbt_test_goals=[
"connect/test",
"connect-client-jvm/test",
"connect-client-jdbc/test",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm whether maven_test.yml needs to be modified.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the tip, it indeed needs to be updated.

import org.apache.spark.SparkBuildInfo.{spark_version => SPARK_VERSION}
import org.apache.spark.util.VersionUtils

class NonRegisteringSparkConnectDriver extends Driver {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether we should implement everything using Java.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 using Java for new modules

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LuciferYang I would prefer to only write the public API in Java, similar to what we did for DSv2.

Let me share my thoughts:

  1. the proposed implementation is based on the existing connect-client-jvm, which is written in Scala, that means users must have scala-runtime on the classpath to use this JDBC driver eventually.

  2. I agree we should define the public API in Java Interface or Class. In this case, the public JDBC APIs(e.g. java.sql.Connection, Statement, ResultSet, etc.) are defined in JDK, and I would not treat the concrete implementation classes as the public API, so it does not matter to write it in Java or Scala.

  3. I can imagine a few sets of additional public APIs exposed by this JDBC driver, e.g. VariantVal, CalendarInterval, that will be returned by Object getObject(int columnIndex), those classes are written in Java.

@github-actions github-actions bot added the INFRA label Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants