[SPARK-52685][SQL][TESTS] Add a clue for flaky test: 'SPARK-47148: AQE should avoid to submit shuffle job on cancellation'

yaooqinn · dongjoon-hyun · commit 3fb4adbf3fa2 · 2025-07-04T15:18:40.000-07:00
### What changes were proposed in this pull request? This PR adds a clue for flaky test: 'SPARK-47148: AQE should avoid to submit shuffle job on cancellation' ### Why are the changes needed? The test fails frequently without clue provided ``` SPARK-47148: AQE should avoid to submit shuffle job on cancellation *** FAILED *** (6 seconds, 90 milliseconds) [info] scala.`package`.Seq.apply[org.apache.spark.SparkException](error).++[Throwable](scala.Option.apply[Throwable](error.getCause())).++[Throwable](scala.Predef.wrapRefArray[Throwable](error.getSuppressed())).exists(((e: Throwable) => e.getMessage().!=(null).&&(e.getMessage().contains("coalesce test error")))) was false (AdaptiveQueryExecSuite.scala:938) [info] org.scalatest.exceptions.TestFailedException: ``` ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? passing CI, and an intentional local failure ``` [info] - SPARK-47148: AQE should avoid to submit shuffle job on cancellation *** FAILED *** (7 seconds, 7 milliseconds) [info] errMsgList.exists(((x$25: String) => x$25.contains("AAAcoalesce test error"))) was fals [info] The error message should contain 'coalesce test error', but got: [info] ====== [info] Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (10.242.151.176 executor driver): java.lang.RuntimeException: coalesce test error [info] at org.apache.spark.sql.execution.adaptive.TestProblematicCoalesceStrategy$TestProblematicCoalesceExec.$anonfun$doExecute$1(AdaptiveQueryExecSuite.scala:3227) [info] at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:866) [info] at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:866) [info] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) [info] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:374) [info] at org.apache.spark.rdd.RDD.iterator(RDD.scala:338) [info] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) [info] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:374) [info] at org.apache.spark.rdd.RDD.iterator(RDD.scala:338) [info] at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:107) [info] at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) [info] at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) [info] at org.apache.spark.scheduler.Task.run(Task.scala:147) [info] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:647) [info] at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:80) [info] at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:77) [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:100) [info] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:650) [info] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [info] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [info] at java.base/java.lang.Thread.run(Thread.java:840) [info] [info] Driver stacktrace: [info] coalesce test error [info] ====== (AdaptiveQueryExecSuite.scala:941) ```` ### Was this patch authored or co-authored using generative AI tooling? no Closes #51375 from yaooqinn/SPARK-52685. Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
@@ -930,14 +930,19 @@ class AdaptiveQueryExecSuite
           SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
           SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") {
           val joined = createJoinedDF()
-          joined.explain(true)
 
           val error = intercept[SparkException] {
             joined.collect()
           }
-          assert((Seq(error) ++ Option(error.getCause) ++ error.getSuppressed()).exists(
-            e => e.getMessage() != null && e.getMessage().contains("coalesce test error")))
-
+          val errMsgList = (error :: error.getCause :: error.getSuppressed.toList)
+            .filter(e => e != null && e.getMessage != null)
+            .map(_.getMessage)
+
+          assert(errMsgList.exists(_.contains("coalesce test error")),
+            s"""
+               |The error message should contain 'coalesce test error', but got:
+               |${errMsgList.mkString("======\n", "\n", "\n======")}
+               |""".stripMargin)
           val adaptivePlan = joined.queryExecution.executedPlan.asInstanceOf[AdaptiveSparkPlanExec]
 
           // All QueryStages should be based on ShuffleQueryStageExec