Skip to content

Conversation

flaming-archer
Copy link
Contributor

@flaming-archer flaming-archer commented Sep 4, 2025

The previous filestatus will not be cached, as its source hivetable will be created every time, and the filestatus will also be created, resulting in different client IDs for the cached object's key, leading to cache invalidation.

It can cause two problems:

  1. Cache failure, which greatly affects query performance because obtaining filestatus from HDFS is a slow process.
  2. Memory leakage occurs because objects are constantly added to the cache, and since the default cache time is -1, the memory will become larger and larger.

Why are the changes needed?

Improve perfomance.

How was this patch tested?

UT and spark sql query.

Was this patch authored or co-authored using generative AI tooling?

No.

cache version 2

cache ut

change ut
@flaming-archer flaming-archer changed the title fix filestatus not cached [KYUUBI #7192]fix filestatus not cached Sep 4, 2025
@flaming-archer flaming-archer changed the title [KYUUBI #7192]fix filestatus not cached [KYUUBI #7192] Fix filestatus not cached Sep 4, 2025
@codecov-commenter
Copy link

codecov-commenter commented Sep 9, 2025

Codecov Report

❌ Patch coverage is 0% with 87 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.00%. Comparing base (6fb4c87) to head (b5aaec0).

Files with missing lines Patch % Lines
...park/connector/hive/read/HiveFileStatusCache.scala 0.00% 44 Missing ⚠️
...kyuubi/spark/connector/hive/HiveTableCatalog.scala 0.00% 30 Missing ⚠️
...spark/connector/hive/KyuubiHiveConnectorConf.scala 0.00% 8 Missing ⚠️
...uubi/spark/connector/hive/read/HiveFileIndex.scala 0.00% 3 Missing ⚠️
...bi/spark/connector/hive/write/HiveBatchWrite.scala 0.00% 2 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##           master   #7191   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         696     697    +1     
  Lines       43530   43607   +77     
  Branches     5883    5898   +15     
======================================
- Misses      43530   43607   +77     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@flaming-archer
Copy link
Contributor Author

@pan3793 could u pls take a look at it

@flaming-archer
Copy link
Contributor Author

I think the failed test case [CI / Spark Connector Cross Version Test (17, 2.13, 3.5, 4.0, normal) (pull_request)](https://github.com/apache/kyuubi/actions/runs/18424029155/job/52502719753?pr=7191)Failing after 15m is caused by incompatible code between Spark 3.5 and Spark 4.0, not the issue that needs to be resolved. The org.apache.spark.sql.catalyst.catalogCatalogTable of Spark 4.0 added a 16 order field 'collation', which caused the 3.5 compiled code to not run in 4.0.
Spark 4.0 code:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants