-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-53914][BUILD][CONNECT] Add connect-client-jdbc module #52619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
public class SparkConnectDriver extends NonRegisteringSparkConnectDriver { | ||
static { | ||
try { | ||
DriverManager.registerDriver(new SparkConnectDriver()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to write the JDBC module in Scala, but I have to write this class in Java because there seems to be no equivalent implementation of Java static block
in Scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java
sounds good to me.
project/SparkBuild.scala
Outdated
), | ||
|
||
(assembly / assemblyMergeStrategy) := { | ||
case PathList("META-INF", "services", xs @ _*) => MergeStrategy.filterDistinctLines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take the same effect as Maven's
org.apache.maven.plugins.shade.resource.ServicesResourceTransformer
Maybe I should fix it independently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, +1 for spinning off this topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened SPARK-53935 (#52636)
override def acceptsURL(url: String): Boolean = url.startsWith("jdbc:sc://") | ||
|
||
override def connect(url: String, info: Properties): Connection = { | ||
throw new UnsupportedOperationException("TODO") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use IDed
TODO always.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created SPARK-53934 and updated comments
sql/connect/client/jdbc/pom.xml
Outdated
<name>Spark Project Connect JDBC Driver</name> | ||
<url>https://spark.apache.org/</url> | ||
<properties> | ||
<sbt.project.name>connect-jdbc</sbt.project.name> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the directory structure, sql/connect/client/jdbc
, connect-client-jdbc
might be correct like connect-client-jvm
in the same level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree consistency is more important, renamed it to connect-client-jdbc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. This looks like a good start, @pan3793 . I left a few minor comments, but we can revisit them later. I believe we need more time to get broader reviews from multiple committers for this PR.
@dongjoon-hyun yeah, I understand, since this introduces user-facing changes. |
Thanks @pan3793 for working on this! This PR looks like a good start PR. |
sbt_test_goals=[ | ||
"connect/test", | ||
"connect-client-jvm/test", | ||
"connect-client-jdbc/test", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please confirm whether maven_test.yml needs to be modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the tip, it indeed needs to be updated.
import org.apache.spark.SparkBuildInfo.{spark_version => SPARK_VERSION} | ||
import org.apache.spark.util.VersionUtils | ||
|
||
class NonRegisteringSparkConnectDriver extends Driver { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether we should implement everything using Java.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 using Java for new modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LuciferYang I would prefer to only write the public API in Java, similar to what we did for DSv2.
Let me share my thoughts:
-
the proposed implementation is based on the existing
connect-client-jvm
, which is written in Scala, that means users must have scala-runtime on the classpath to use this JDBC driver eventually. -
I agree we should define the public API in Java Interface or Class. In this case, the public JDBC APIs(e.g.
java.sql.Connection
,Statement
,ResultSet
, etc.) are defined in JDK, and I would not treat the concrete implementation classes as the public API, so it does not matter to write it in Java or Scala. -
I can imagine a few sets of additional public APIs exposed by this JDBC driver, e.g.
VariantVal
,CalendarInterval
, that will be returned byObject getObject(int columnIndex)
, those classes are written in Java.
What changes were proposed in this pull request?
Developer-oriented stuff:
artifactId
isspark-connect-client-jdbc_2.13
connect-client-jdbc
sql/connect/client/jdbc
<DIST_DIR>/jars/connect-repl/
, colocated with theconnect-client-jvm
jarUser-facing points:
The JDBC URL reuses the current URL used by the Spark Connect client, with an additional prefix
jdbc:
, e.g.,jdbc:sc://localhost:15002
JDBC Driver class name is:
org.apache.spark.sql.connect.client.jdbc.SparkConnectDriver
Why are the changes needed?
Kick off SPIP: JDBC Driver for Spark Connect
Does this PR introduce any user-facing change?
New feature.
How was this patch tested?
UT added.
Was this patch authored or co-authored using generative AI tooling?
No.