Skip to content

[SQL][TESTS][SPARK-52918] Batch JDBC database statements in JDBC suites #51616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

alekjarmov
Copy link
Contributor

@alekjarmov alekjarmov commented Jul 22, 2025

What changes were proposed in this pull request?

To modify the before all to reduce the amount of roundtrips with the database. Per my benchmarks this led to decreased time in beforeAll from ~850ms to ~690ms in JDBCV2Suite and 3s to 1.8s in JDBCSuite. It also clears tech debt where people in the future won't unknowingly add more roundtrips than needed.

Why are the changes needed?

Improve test performance, it is not drastic improvement when running whole suite but is great when running just a single test.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Test-only change.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claiude Sonnet

@github-actions github-actions bot added the SQL label Jul 22, 2025
@alekjarmov alekjarmov changed the title Batch JDBC database statements in JDBC suites [SQL][SPARK-52918] Batch JDBC database statements in JDBC suites Jul 22, 2025
Copy link
Contributor

@PetarVasiljevic-DB PetarVasiljevic-DB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alekjarmov I am interest in performance when we do addBatch, do you maybe know the numbers?

The diff would be minimal and it would be cleaner to read.

@alekjarmov
Copy link
Contributor Author

alekjarmov commented Jul 22, 2025

@PetarVasiljevic-DB It should be around the same as we would have the same amount of network roundtrips which is where the most time is wasted, just the rewrite/diff would be more different as they use some different syntax with question marks.

@alekjarmov alekjarmov changed the title [SQL][SPARK-52918] Batch JDBC database statements in JDBC suites [SQL][TESTS}[SPARK-52918] Batch JDBC database statements in JDBC suites Jul 22, 2025
@alekjarmov alekjarmov changed the title [SQL][TESTS}[SPARK-52918] Batch JDBC database statements in JDBC suites [SQL][TESTS][SPARK-52918] Batch JDBC database statements in JDBC suites Jul 22, 2025
@PetarVasiljevic-DB
Copy link
Contributor

@PetarVasiljevic-DB It should be around the same as we would have the same amount of network roundtrips which is where the most time is wasted, just the rewrite/diff would be more different as they use some different syntax with question marks.

You wouldn't use question marks. You would create statement at the beginning and then instead of executing the statement, you just call addBatch instead of prepareStatement: https://www.baeldung.com/jdbc-batch-processing#1-batch-processing-using-statement

batchStmt.addBatch(
"CREATE TABLE \"test\".\"strings_with_nulls\" (str TEXT(32))")

batchStmt.addBatch("INSERT INTO \"test\".\"people\" VALUES ('fred', 1)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this suite, CREATE TABLE and INSERT are in one batch. Why can't we do the same for JDBCSuite.scala?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did it in JDBCSuite as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants