Skip to content

[SPARK-52265][SQL][TEST] Fix regex leading to empty PROCESS_TABLES.testingVersions in HiveExternalCatalogVersionsSuite #50989

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

efaracci018
Copy link

@efaracci018 efaracci018 commented May 22, 2025

What changes were proposed in this pull request?

Fix the version parsing logic in HiveExternalCatalogVersionsSuite to properly handle new artifact paths in https://dist.apache.org/repos/dist/release/spark/ so that "backward compatibility" test can be run.

This change creates a constant val SparkVersionPattern = """<a href="spark-(\d.\d.\d)/">""".r for more precise version matching, and removes redundant .filterNot(_.contains("preview")) which is no longer needed.

Why are the changes needed?

The suite is failing to execute the "backward compatibility" test due to parsing errors with testing versions. The current implementation fails to parse versions when encountering new paths like spark-connect-swift-0.1.0/ and spark-kubernetes-operator-0.1.0/ in https://dist.apache.org/repos/dist/release/spark/.

This leads to PROCESS_TABLES.testingVersions being empty, and in turn a logError: "Exception encountered when invoking run on a nested suite - Fail to get the latest Spark versions to test". As a result, the condition is not met to run the test.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Executed local build and test for HiveExternalCatalogVersionsSuite:

build/mvn -pl sql/hive-Dtest=none -DwildcardSuites=org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite test-compile scalatest:test

Verified that the reported error no longer appears, "backward compatibility" test runs successfully, and PROCESS_TABLES.testingVersions now correctly contains "3.5.5" when printed out, which was previously empty.

Was this patch authored or co-authored using generative AI tooling?

No.

…stingVersions in HiveExternalCatalogVersionsSuite
@github-actions github-actions bot added the SQL label May 22, 2025
@efaracci018
Copy link
Author

@cloud-fan please review, thank you

@LuciferYang
Copy link
Contributor

LuciferYang commented May 23, 2025

It seems that new issues may still arise after the 4.0 release with this pr:

build/sbt "hive/testOnly org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite" -Phive
[info] HiveExternalCatalogVersionsSuite:
12:55:31.252 WARN org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite: Failed to download Spark 4.0.0 from https://dlcdn.apache.org//spark/spark-4.0.0/spark-4.0.0-bin-hadoop3-scala2.13.tgz: https://dlcdn.apache.org//spark/spark-4.0.0/spark-4.0.0-bin-hadoop3-scala2.13.tgz
12:55:32.485 WARN org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite: Failed to download Spark 4.0.0 from https://archive.apache.org/dist/spark/spark-4.0.0/spark-4.0.0-bin-hadoop3-scala2.13.tgz: https://archive.apache.org/dist/spark/spark-4.0.0/spark-4.0.0-bin-hadoop3-scala2.13.tgz
12:55:33.414 WARN org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite: Failed to download Spark 4.0.0 from https://dist.apache.org/repos/dist/release/spark/spark-4.0.0/spark-4.0.0-bin-hadoop3-scala2.13.tgz: https://dist.apache.org/repos/dist/release/spark/spark-4.0.0/spark-4.0.0-bin-hadoop3-scala2.13.tgz
12:55:33.597 WARN org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite: 

===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.hive.HiveExternalCatalogVersionsSuite, threads: Keep-Alive-Timer (daemon=true) =====
[info] org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED *** (3 minutes, 41 seconds)
[info]   Unable to download Spark 4.0.0 (HiveExternalCatalogVersionsSuite.scala:125)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
[info]   at org.scalatest.Assertions.fail(Assertions.scala:933)
[info]   at org.scalatest.Assertions.fail$(Assertions.scala:929)
[info]   at org.scalatest.funsuite.AnyFunSuite.fail(AnyFunSuite.scala:1564)
[info]   at org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite.tryDownloadSpark(HiveExternalCatalogVersionsSuite.scala:125)
[info]   at org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite.$anonfun$beforeAll$4(HiveExternalCatalogVersionsSuite.scala:212)
[info]   at org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite.$anonfun$beforeAll$4$adapted(HiveExternalCatalogVersionsSuite.scala:209)
[info]   at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619)
[info]   at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617)
[info]   at scala.collection.AbstractIterable.foreach(Iterable.scala:935)
[info]   at org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite.beforeAll(HiveExternalCatalogVersionsSuite.scala:209)
[info]   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
[info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:69)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
[info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414)
[info]   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[info]   at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[info]   at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[info]   at java.base/java.lang.Thread.run(Thread.java:840)
[info] Run completed in 3 minutes, 42 seconds.
[info] Total number of tests run: 0
[info] Suites: completed 0, aborted 1
[info] Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0
[info] *** 1 SUITE ABORTED ***
[error] Error during tests:
[error] 	org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite
[error] (hive / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful

The package name for version 4.0 doesn't have the identifier 'scala-2.13'.

image

@efaracci018 Are you interested in taking another look?

@efaracci018
Copy link
Author

Thank you @LuciferYang for pointing out this recent issue, I am interesting in taking a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants