[SPARK-52312][SQL] Ignore V2WriteCommand when caching DataFrame #51032

qiyuandong-db · 2025-05-27T19:01:14Z

What changes were proposed in this pull request?

We found an issue that V2WriteCommand plans were not properly excluded from DataFrame caching, which can cause unintended side effects.

For example, when cache() is called on a DataFrame created from an INSERT SQL statement, the INSERT command gets re-executed during the caching process because the underlying plan is not being ignored.

This PR fixes this by

making V2WriteCommand extend the IgnoreCachedData trait
updating the caching logic to skip plans that extend IgnoreCachedData, preventing inapplicable plans from being cached

Why are the changes needed?

This is a bug, since calling cache() on a DataFrame shouldn't re-execute the command that created it.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New tests were added.

Was this patch authored or co-authored using generative AI tooling?

No.

sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala

sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala

cloud-fan · 2025-05-30T01:23:48Z

thanks, merging to master!

### What changes were proposed in this pull request? We found an issue that `V2WriteCommand` plans were not properly excluded from `DataFrame` caching, which can cause unintended side effects. For example, when `cache()` is called on a `DataFrame` created from an `INSERT` SQL statement, the `INSERT` command gets re-executed during the caching process because the underlying plan is not being ignored. This PR fixes this by - making `V2WriteCommand` extend the `IgnoreCachedData` trait - updating the caching logic to skip plans that extend `IgnoreCachedData`, preventing inapplicable plans from being cached ### Why are the changes needed? This is a bug, since calling `cache()` on a `DataFrame` shouldn't re-execute the command that created it. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New tests were added. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#51032 from qiyuandong-db/SPARK-52312-ignore-v2writecommand-caching. Authored-by: Qiyuan Dong <qiyuan.dong@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

Ignore V2WriteCommand when caching

18b06c7

github-actions bot added the SQL label May 27, 2025

cloud-fan reviewed May 28, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala Outdated Show resolved Hide resolved

cloud-fan reviewed May 28, 2025

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala Show resolved Hide resolved

qiyuandong-db added 2 commits May 28, 2025 13:21

Remove repeated tests

e9942b7

Move IgnoreCachedData check to cacheQueryInternal

7c6cc96

qiyuandong-db requested a review from cloud-fan May 28, 2025 11:35

cloud-fan reviewed May 29, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala Show resolved Hide resolved

cloud-fan approved these changes May 29, 2025

View reviewed changes

address comments

05539d3

cloud-fan approved these changes May 29, 2025

View reviewed changes

cloud-fan closed this in 6422256 May 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-52312][SQL] Ignore V2WriteCommand when caching DataFrame #51032

[SPARK-52312][SQL] Ignore V2WriteCommand when caching DataFrame #51032

Uh oh!

qiyuandong-db commented May 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloud-fan commented May 30, 2025

Uh oh!

Uh oh!

[SPARK-52312][SQL] Ignore V2WriteCommand when caching DataFrame #51032

[SPARK-52312][SQL] Ignore V2WriteCommand when caching DataFrame #51032

Uh oh!

Conversation

qiyuandong-db commented May 27, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloud-fan commented May 30, 2025

Uh oh!

Uh oh!