-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SQL][TESTS][SPARK-52918] Batch JDBC database statements in JDBC suites #51616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alekjarmov I am interest in performance when we do addBatch, do you maybe know the numbers?
The diff would be minimal and it would be cleaner to read.
@PetarVasiljevic-DB It should be around the same as we would have the same amount of network roundtrips which is where the most time is wasted, just the rewrite/diff would be more different as they use some different syntax with question marks. |
You wouldn't use question marks. You would create statement at the beginning and then instead of executing the statement, you just call addBatch instead of prepareStatement: https://www.baeldung.com/jdbc-batch-processing#1-batch-processing-using-statement |
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
Outdated
Show resolved
Hide resolved
batchStmt.addBatch( | ||
"CREATE TABLE \"test\".\"strings_with_nulls\" (str TEXT(32))") | ||
|
||
batchStmt.addBatch("INSERT INTO \"test\".\"people\" VALUES ('fred', 1)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this suite, CREATE TABLE and INSERT are in one batch. Why can't we do the same for JDBCSuite.scala
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did it in JDBCSuite as well.
What changes were proposed in this pull request?
To modify the before all to reduce the amount of roundtrips with the database. Per my benchmarks this led to decreased time in
beforeAll
from ~850ms to ~690ms inJDBCV2Suite
and 3s to 1.8s inJDBCSuite
. It also clears tech debt where people in the future won't unknowingly add more roundtrips than needed.Why are the changes needed?
Improve test performance, it is not drastic improvement when running whole suite but is great when running just a single test.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Test-only change.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claiude Sonnet