[KYUUBI #7192] Fix filestatus not cached #7191

flaming-archer · 2025-09-04T09:13:57Z

The previous filestatus will not be cached, as its source hivetable will be created every time, and the filestatus will also be created, resulting in different client IDs for the cached object's key, leading to cache invalidation.

It can cause two problems:

Cache failure, which greatly affects query performance because obtaining filestatus from HDFS is a slow process.
Memory leakage occurs because objects are constantly added to the cache, and since the default cache time is -1, the memory will become larger and larger.

Why are the changes needed?

Improve perfomance.

How was this patch tested?

UT and spark sql query.

Was this patch authored or co-authored using generative AI tooling?

No.

cache version 2 cache ut change ut

codecov-commenter · 2025-09-09T10:59:34Z

Codecov Report

❌ Patch coverage is 0% with 87 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.00%. Comparing base (6fb4c87) to head (b5aaec0).

Files with missing lines	Patch %	Lines
...park/connector/hive/read/HiveFileStatusCache.scala	0.00%	44 Missing ⚠️
...kyuubi/spark/connector/hive/HiveTableCatalog.scala	0.00%	30 Missing ⚠️
...spark/connector/hive/KyuubiHiveConnectorConf.scala	0.00%	8 Missing ⚠️
...uubi/spark/connector/hive/read/HiveFileIndex.scala	0.00%	3 Missing ⚠️
...bi/spark/connector/hive/write/HiveBatchWrite.scala	0.00%	2 Missing ⚠️

Additional details and impacted files

@@          Coverage Diff           @@
##           master   #7191   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         696     697    +1     
  Lines       43530   43607   +77     
  Branches     5883    5898   +15     
======================================
- Misses      43530   43607   +77

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

flaming-archer · 2025-09-10T01:37:48Z

@pan3793 could u pls take a look at it

...or-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HiveFileStatusCache.scala

…_cache

flaming-archer · 2025-10-11T08:13:36Z

I think the failed test case [CI / Spark Connector Cross Version Test (17, 2.13, 3.5, 4.0, normal) (pull_request)](https://github.com/apache/kyuubi/actions/runs/18424029155/job/52502719753?pr=7191)Failing after 15m is caused by incompatible code between Spark 3.5 and Spark 4.0, not the issue that needs to be resolved. The org.apache.spark.sql.catalyst.catalogCatalogTable of Spark 4.0 added a 16 order field 'collation', which caused the 3.5 compiled code to not run in 4.0.
Spark 4.0 code:

cached fileindex

1da81f2

cache version 2 cache ut change ut

github-actions bot added module:spark module:extensions labels Sep 4, 2025

flaming-archer changed the title ~~fix filestatus not cached~~ [KYUUBI #7192]fix filestatus not cached Sep 4, 2025

flaming-archer changed the title ~~[KYUUBI #7192]fix filestatus not cached~~ [KYUUBI #7192] Fix filestatus not cached Sep 4, 2025

fix failed tests

9353f0c

pan3793 reviewed Sep 11, 2025

View reviewed changes

...or-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HiveFileStatusCache.scala Show resolved Hide resolved

pan3793 reviewed Sep 11, 2025

View reviewed changes

...or-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HiveFileStatusCache.scala Outdated Show resolved Hide resolved

Improve the lifecycle management of cache

9f2168c

flaming-archer force-pushed the master_cache branch from 5b7354e to 9f2168c Compare September 28, 2025 02:39

flaming-archer added 5 commits October 10, 2025 16:53

review change and format code

92cbdd9

format code

c43d9c5

fix code style

2488479

fix failed tests

ad428cb

Merge branch 'master' of https://github.com/apache/kyuubi into master…

53d91c0

…_cache

flaming-archer requested a review from pan3793 October 11, 2025 08:17

flaming-archer added 2 commits October 13, 2025 23:21

fix failed ut

d4f4d14

fix altertable catalogTable copy not support spark v4.0

b5aaec0

flaming-archer force-pushed the master_cache branch from 7950d8b to b5aaec0 Compare October 13, 2025 15:21

flaming-archer mentioned this pull request Oct 14, 2025

[Improvement] The execution time of spark connector is 4 times that of native spark3.3.2 when running tpcds sql99 apache/gravitino#7048

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[KYUUBI #7192] Fix filestatus not cached #7191

[KYUUBI #7192] Fix filestatus not cached #7191

flaming-archer commented Sep 4, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Sep 9, 2025 •

edited

Loading

Uh oh!

flaming-archer commented Sep 10, 2025

Uh oh!

Uh oh!

Uh oh!

flaming-archer commented Oct 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[KYUUBI #7192] Fix filestatus not cached #7191

Are you sure you want to change the base?

[KYUUBI #7192] Fix filestatus not cached #7191

Conversation

flaming-archer commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are the changes needed?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

codecov-commenter commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

flaming-archer commented Sep 10, 2025

Uh oh!

Uh oh!

Uh oh!

flaming-archer commented Oct 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

flaming-archer commented Sep 4, 2025 •

edited

Loading

codecov-commenter commented Sep 9, 2025 •

edited

Loading