fix: [iceberg] Switch to OSS Spark and run Iceberg Spark tests in parallel #1987

hsiang-c · 2025-07-03T21:34:45Z

Which issue does this PR close?

Closes #. #1685

Rationale for this change

When we enabled Iceberg Spark tests w/ Comet-enabled Spark in #1715

We didn't enable CometShuffleManager, this PR fixes it.
We implicitly loaded org.apache.comet.CometSparkSessionExtensions b/c Iceberg depends on the patched Spark. This PR explicitly configures every SparkSession.Builder with .config("spark.plugins", "org.apache.spark.CometPlugin") so that we can depend on OSS Spark
The patch we applied to SparkPlanInfo.scala affects the plan to the ListenerBus event so switching to OSS Spark is okay.
Split Iceberg Spark tests into 3 actions and run them in parallel. ENABLE_COMET is true for all 3 actions.

Thanks to @andygrove for pointing out.

What changes are included in this PR?

How are these changes tested?

kazuyukitanimura

pending CI

kazuyukitanimura · 2025-07-03T22:37:47Z

dev/diffs/iceberg/1.8.1.diff

+             .config("spark.sql.legacy.respectNullabilityInTextDatasetConversion", "true")
+             .config(
+                 SQLConf.ADAPTIVE_EXECUTION_ENABLED().key(), String.valueOf(RANDOM.nextBoolean()))
+            .config("spark.plugins", "org.apache.spark.CometPlugin")


This makes sense but could be error prone. If there is a new test that uses spark session, we miss enabling it.
Wondering if there is a good way to update all spark session at once...

@kazuyukitanimura

We're lucky in some cases b/c TestBase and ExtensionsTestBase consolidate SparkSession.Builder in the abstract class.

Unfortunately, other test classes and jmh build their own SparkSession each time :(

kazuyukitanimura

pending with CI

codecov-commenter · 2025-07-03T23:15:04Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 58.20%. Comparing base (f09f8af) to head (739cfc3).
Report is 312 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1987      +/-   ##
============================================
+ Coverage     56.12%   58.20%   +2.07%     
- Complexity      976     1152     +176     
============================================
  Files           119      133      +14     
  Lines         11743    13039    +1296     
  Branches       2251     2419     +168     
============================================
+ Hits           6591     7589     +998     
- Misses         4012     4216     +204     
- Partials       1140     1234      +94

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- See actions/checkout#19

andygrove · 2025-07-08T17:45:37Z

I see that some tests are failing. I didn't run into this specific issue during my testing.

 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1764.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1764.0 (TID 3312) (localhost executor driver): 
java.lang.ClassCastException: class org.apache.spark.sql.catalyst.expressions.GenericInternalRow cannot be cast to class org.apache.spark.sql.vectorized.ColumnarBatch (org.apache.spark.sql.catalyst.expressions.GenericInternalRow and org.apache.spark.sql.vectorized.ColumnarBatch are in unnamed module of loader 'app')

hsiang-c · 2025-07-15T20:49:18Z

Most of the exceptions in Iceberg Spark SQL Tests can be reproduced by

Follow the official guide to build Comet and Iceberg, configure Spark shell and populate the Iceberg table: https://datafusion.apache.org/comet/user-guide/iceberg.html
Query Iceberg metadata tables with an operator. Here is an example:

-- default is the catalog name used in local HadoopCatalog setup
scala> spark.sql(s"SELECT COUNT(*) from default.t1.snapshots").show()

25/07/15 13:06:16 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
java.lang.ClassCastException: class org.apache.iceberg.spark.source.StructInternalRow cannot be cast to class org.apache.spark.sql.vectorized.ColumnarBatch (org.apache.iceberg.spark.source.StructInternalRow is in unnamed module of loader scala.reflect.internal.util.ScalaClassLoader$URLClassLoader @19ac93d2; org.apache.spark.sql.vectorized.ColumnarBatch is in unnamed module of loader 'app')
	at org.apache.spark.sql.comet.CometBatchScanExec$$anon$1.next(CometBatchScanExec.scala:68)
	at org.apache.spark.sql.comet.CometBatchScanExec$$anon$1.next(CometBatchScanExec.scala:57)
	at org.apache.comet.CometBatchIterator.hasNext(CometBatchIterator.java:51)
	at org.apache.comet.Native.executePlan(Native Method)
	at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$2(CometExecIterator.scala:155)
	at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$2$adapted(CometExecIterator.scala:154)
	at org.apache.comet.vector.NativeUtil.getNextBatch(NativeUtil.scala:157)
	at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$1(CometExecIterator.scala:154)
	at org.apache.comet.Tracing$.withTrace(Tracing.scala:31)
	at org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:152)
	at org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:203)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.comet.CometBatchIterator.hasNext(CometBatchIterator.java:50)
	at org.apache.comet.Native.executePlan(Native Method)
	at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$2(CometExecIterator.scala:155)
	at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$2$adapted(CometExecIterator.scala:154)
	at org.apache.comet.vector.NativeUtil.getNextBatch(NativeUtil.scala:157)
	at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$1(CometExecIterator.scala:154)
	at org.apache.comet.Tracing$.withTrace(Tracing.scala:31)
	at org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:152)
	at org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:203)
	at org.apache.spark.sql.comet.execution.shuffle.CometNativeShuffleWriter.write(CometNativeShuffleWriter.scala:106)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

parthchandra · 2025-07-17T01:33:27Z

@hsiang-c created #2033 to track this issue

hsiang-c added 3 commits July 3, 2025 14:26

fix: [iceberg] Enable CometShuffleManager in Iceberg Spark tests

f02b930

Depends on OSS Spark

905660c

Parallelize Iceberg Spark tests

3b35719

hsiang-c changed the title ~~fix: [iceberg] Enable CometShuffleManager in Iceberg Spark tests~~ fix: [iceberg] Switch to OSS Spark and run Iceberg Spark tests in parallel Jul 3, 2025

kazuyukitanimura reviewed Jul 3, 2025

View reviewed changes

kazuyukitanimura approved these changes Jul 3, 2025

View reviewed changes

fix: no need to change directory

b574107

hsiang-c added 6 commits July 4, 2025 11:40

Change build order

a0ba164

Checkout action doesn't persist across jobs

4bc99f4

- See actions/checkout#19

Need to checkout first

bf9ee90

Move Comet build to prepare stage

43c830b

Replicate setup 3 times

7758599

Apply spotless

739cfc3

hsiang-c mentioned this pull request Jul 9, 2025

fix: [iceberg] Add LogicalTypeAnnotation in ParquetColumnSpec #2000

Merged

parthchandra mentioned this pull request Jul 17, 2025

Comet cannot execute some iceberg metadata table queries #2033

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: [iceberg] Switch to OSS Spark and run Iceberg Spark tests in parallel #1987

fix: [iceberg] Switch to OSS Spark and run Iceberg Spark tests in parallel #1987

Uh oh!

hsiang-c commented Jul 3, 2025 •

edited

Loading

Uh oh!

kazuyukitanimura left a comment

Uh oh!

kazuyukitanimura Jul 3, 2025

Uh oh!

hsiang-c Jul 3, 2025

Uh oh!

kazuyukitanimura left a comment

Uh oh!

codecov-commenter commented Jul 3, 2025 •

edited

Loading

Uh oh!

andygrove commented Jul 8, 2025 •

edited

Loading

Uh oh!

hsiang-c commented Jul 15, 2025

Uh oh!

parthchandra commented Jul 17, 2025

Uh oh!

Uh oh!

fix: [iceberg] Switch to OSS Spark and run Iceberg Spark tests in parallel #1987

Are you sure you want to change the base?

fix: [iceberg] Switch to OSS Spark and run Iceberg Spark tests in parallel #1987

Uh oh!

Conversation

hsiang-c commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

kazuyukitanimura left a comment

Choose a reason for hiding this comment

Uh oh!

kazuyukitanimura Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

hsiang-c Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

kazuyukitanimura left a comment

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

andygrove commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsiang-c commented Jul 15, 2025

Uh oh!

parthchandra commented Jul 17, 2025

Uh oh!

Uh oh!

hsiang-c commented Jul 3, 2025 •

edited

Loading

codecov-commenter commented Jul 3, 2025 •

edited

Loading

andygrove commented Jul 8, 2025 •

edited

Loading