-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-52265][SQL][TEST] Fix regex leading to empty PROCESS_TABLES.testingVersions in HiveExternalCatalogVersionsSuite #50989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…stingVersions in HiveExternalCatalogVersionsSuite
@cloud-fan please review, thank you |
It seems that new issues may still arise after the 4.0 release with this pr:
The package name for version 4.0 doesn't have the identifier 'scala-2.13'. @efaracci018 Are you interested in taking another look? |
Thank you @LuciferYang for pointing out this recent issue, I am interesting in taking a look. |
Requiring identifier 'scala-2.13' was introduced by this following PR, with a comment that noted
Given official 4.0 release is out, we can drop the 'scala-2.13' identifier for 4.0, but need to maintain it for any 3.x version. This fix allows the test to run, and I verified that scala 2.13 is used:
Please note the following constraints:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM (pending tests)
@@ -96,7 +96,10 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils { | |||
mirrors.distinct :+ "https://archive.apache.org/dist" :+ PROCESS_TABLES.releaseMirror | |||
logInfo(s"Trying to download Spark $version from $sites") | |||
for (site <- sites) { | |||
val filename = s"spark-$version-bin-hadoop3-scala2.13.tgz" | |||
val filename = version match { | |||
case v if v.startsWith("3") => s"spark-$version-bin-hadoop3-scala2.13.tgz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use "3." for the version match
val filename = s"spark-$version-bin-hadoop3-scala2.13.tgz" | ||
val filename = version match { | ||
case v if v.startsWith("3") => s"spark-$version-bin-hadoop3-scala2.13.tgz" | ||
case _ => s"spark-$version-bin-hadoop3.tgz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider using "4." for the version match and fail the test for unknown/not expected version.
@@ -96,7 +96,10 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils { | |||
mirrors.distinct :+ "https://archive.apache.org/dist" :+ PROCESS_TABLES.releaseMirror | |||
logInfo(s"Trying to download Spark $version from $sites") | |||
for (site <- sites) { | |||
val filename = s"spark-$version-bin-hadoop3-scala2.13.tgz" | |||
val filename = version match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider
val filename = s"spark-$version-bin-hadoop3$scala_version.tgz"
and define
val scala_version = ...
using version match logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I've put up a revision with this change.
LGTM |
@@ -96,7 +96,12 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils { | |||
mirrors.distinct :+ "https://archive.apache.org/dist" :+ PROCESS_TABLES.releaseMirror | |||
logInfo(s"Trying to download Spark $version from $sites") | |||
for (site <- sites) { | |||
val filename = s"spark-$version-bin-hadoop3-scala2.13.tgz" | |||
val scala_version = version match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val scala_version = version match { | |
val scalaVersion = version match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variables should be named using camelCase in Spark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I've updated scalaVersion
, and sparkVersionPattern
as well.
Once the tests pass, I will merge this PR |
@efaracci018 Could you please provide your Jira account? I need to add your Jira account to the Spark Contributor group. |
Thanks @LuciferYang, my Jira account is efaracci018. |
…stingVersions in HiveExternalCatalogVersionsSuite ### What changes were proposed in this pull request? Fix the version parsing logic in `HiveExternalCatalogVersionsSuite` to properly handle new artifact paths in https://dist.apache.org/repos/dist/release/spark/ so that "backward compatibility" test can be run. This change creates a constant `val SparkVersionPattern = """<a href="spark-(\d.\d.\d)/">""".r` for more precise version matching, and removes redundant `.filterNot(_.contains("preview"))` which is no longer needed. ### Why are the changes needed? The suite is failing to execute the "backward compatibility" test due to parsing errors with testing versions. The current implementation fails to parse versions when encountering new paths like `spark-connect-swift-0.1.0/` and `spark-kubernetes-operator-0.1.0/` in https://dist.apache.org/repos/dist/release/spark/. This leads to `PROCESS_TABLES.testingVersions` being empty, and in turn a logError: "Exception encountered when invoking run on a nested suite - Fail to get the latest Spark versions to test". As a result, the condition is not met to run the test. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Executed local build and test for `HiveExternalCatalogVersionsSuite`: `build/mvn -pl sql/hive-Dtest=none -DwildcardSuites=org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite test-compile scalatest:test` Verified that the reported error no longer appears, "backward compatibility" test runs successfully, and `PROCESS_TABLES.testingVersions` now correctly contains "3.5.5" when printed out, which was previously empty. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50989 from efaracci018/fix-testingVersions. Lead-authored-by: Emilie Faracci <efaracci@amazon.com> Co-authored-by: efaracci018 <efaracci@amazon.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit a380380) Signed-off-by: yangjie01 <yangjie01@baidu.com>
Merged into master and branch-4.0. Thanks @efaracci018 and @vrozov @efaracci018 Welcome to the Apache Spark community. |
…stingVersions in HiveExternalCatalogVersionsSuite ### What changes were proposed in this pull request? Fix the version parsing logic in `HiveExternalCatalogVersionsSuite` to properly handle new artifact paths in https://dist.apache.org/repos/dist/release/spark/ so that "backward compatibility" test can be run. This change creates a constant `val SparkVersionPattern = """<a href="spark-(\d.\d.\d)/">""".r` for more precise version matching, and removes redundant `.filterNot(_.contains("preview"))` which is no longer needed. ### Why are the changes needed? The suite is failing to execute the "backward compatibility" test due to parsing errors with testing versions. The current implementation fails to parse versions when encountering new paths like `spark-connect-swift-0.1.0/` and `spark-kubernetes-operator-0.1.0/` in https://dist.apache.org/repos/dist/release/spark/. This leads to `PROCESS_TABLES.testingVersions` being empty, and in turn a logError: "Exception encountered when invoking run on a nested suite - Fail to get the latest Spark versions to test". As a result, the condition is not met to run the test. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Executed local build and test for `HiveExternalCatalogVersionsSuite`: `build/mvn -pl sql/hive-Dtest=none -DwildcardSuites=org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite test-compile scalatest:test` Verified that the reported error no longer appears, "backward compatibility" test runs successfully, and `PROCESS_TABLES.testingVersions` now correctly contains "3.5.5" when printed out, which was previously empty. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50989 from efaracci018/fix-testingVersions. Lead-authored-by: Emilie Faracci <efaracci@amazon.com> Co-authored-by: efaracci018 <efaracci@amazon.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>
What changes were proposed in this pull request?
Fix the version parsing logic in
HiveExternalCatalogVersionsSuite
to properly handle new artifact paths in https://dist.apache.org/repos/dist/release/spark/ so that "backward compatibility" test can be run.This change creates a constant
val SparkVersionPattern = """<a href="spark-(\d.\d.\d)/">""".r
for more precise version matching, and removes redundant.filterNot(_.contains("preview"))
which is no longer needed.Why are the changes needed?
The suite is failing to execute the "backward compatibility" test due to parsing errors with testing versions. The current implementation fails to parse versions when encountering new paths like
spark-connect-swift-0.1.0/
andspark-kubernetes-operator-0.1.0/
in https://dist.apache.org/repos/dist/release/spark/.This leads to
PROCESS_TABLES.testingVersions
being empty, and in turn a logError: "Exception encountered when invoking run on a nested suite - Fail to get the latest Spark versions to test". As a result, the condition is not met to run the test.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Executed local build and test for
HiveExternalCatalogVersionsSuite
:build/mvn -pl sql/hive-Dtest=none -DwildcardSuites=org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite test-compile scalatest:test
Verified that the reported error no longer appears, "backward compatibility" test runs successfully, and
PROCESS_TABLES.testingVersions
now correctly contains "3.5.5" when printed out, which was previously empty.Was this patch authored or co-authored using generative AI tooling?
No.