Skip to content

chore: Remove obsolete supportedSortType function after Arrow updates #1946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Ruchir28
Copy link

Which issue does this PR close?

Closes #1854

Rationale for this change

The supportedSortType function was a fallback mechanism added to avoid Comet errors on complex single column case, as DataFusion SortExec calls arrow's lexsort_to_indices and the function fallbacks to sort_to_indices for single column case. However, sort_to_indices doesn't support all data types, e.g., struct which led to errors , as reported in this issue

However, with recent Arrow updates, these limitations have been resolved by this PR

As a result, the supportedSortType fallback is no longer needed and, in fact, prevents us from taking advantage of the improved native performance. This PR removes the function and its usages, allowing Comet to handle these sorting operations directly.

What changes are included in this PR?

  • Removed the supportedSortType function and it's usages
  • Updated the tests accordingly to confirm operations are handled by comet instead of falling back to spark

How are these changes tested?

The existing test case been updated. By changing the test assertion from checkSparkAnswer to checkSparkAnswerAndOperator, we now verify that the operation is correctly executed by the Comet native operator, confirming the fallback to Spark is no longer triggered.

@codecov-commenter
Copy link

codecov-commenter commented Jun 27, 2025

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 58.40%. Comparing base (f09f8af) to head (b640cbe).
Report is 310 commits behind head on main.

Files with missing lines Patch % Lines
...ark/sql/comet/CometTakeOrderedAndProjectExec.scala 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1946      +/-   ##
============================================
+ Coverage     56.12%   58.40%   +2.27%     
- Complexity      976     1140     +164     
============================================
  Files           119      131      +12     
  Lines         11743    12878    +1135     
  Branches       2251     2383     +132     
============================================
+ Hits           6591     7521     +930     
- Misses         4012     4135     +123     
- Partials       1140     1222      +82     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove
Copy link
Member

There are 4 failing Spark SQL tests. Here is the first one:

2025-06-27T15:04:42.5762508Z [info] - SPARK-47430 Support GROUP BY MapType *** FAILED *** (694 milliseconds)
2025-06-27T15:04:42.5778426Z [info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1735.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1735.0 (TID 1482) (43ff0ed8e63a executor driver): org.apache.comet.CometNativeException: Invalid argument error: The data type type Map(Field { name: "entries", data_type: Struct([Field { name: "key", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false) has no natural order

@andygrove
Copy link
Member

@Ruchir28 I suspect that we can't completely remove supportedSortType yet, but it could be updated to remove many of the current restrictions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Relax sort fallback constraints
3 participants