[SPARK-52689][SQL] Send DML Metrics to V2Write #51377

szehon-ho · 2025-07-04T19:02:29Z

What changes were proposed in this pull request?

Send some DML execution metrics (ie, MergeRowsExec) to the write of these data source, so they can persist them for debugging purpose.

Why are the changes needed?

DML row-level-operations, ie MERGE, UPDATE, DELETE are a critical functionality of V2 data sources (like Iceberg). It will be nice, if we can send some DML metrics to the commit of these data source, so they can persist them for debugging purpose on commit metadata.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test

Was this patch authored or co-authored using generative AI tooling?

No

...core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala

aokolnychyi · 2025-07-15T03:48:46Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/MergeMetrics.java

+public interface MergeMetrics {
+
+  class Builder {
+    private long numTargetRowsCopied = -1;


I personally don't mind -1 but I think we have a few places in DSv2 that use OptionalLong.
Will it make sense to be consistent?

ok changed to OptionalLong in the API

sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/MergeMetrics.java

sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/BatchWrite.java

aokolnychyi · 2025-07-15T03:57:19Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/BatchWrite.java

+  /**
+   * Whether this batch write requests merge execution metrics.
+   */
+  default boolean requestMergeMetrics() {


Is there a performance hit for requesting metrics? If not, I'd drop this method and always call commitMerge. The fewer public methods we have the better.

The perf hit is a execution graph walk. Anyway, i removed the check, and walk in all the cases.

aokolnychyi · 2025-07-15T04:00:08Z

...core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala

@@ -275,7 +277,7 @@ class DataSourceV2Strategy(session: SparkSession) extends Strategy with Predicat
      }

    case AppendData(r: DataSourceV2Relation, query, _, _, Some(write), _) =>
-      AppendDataExec(planLater(query), refreshCache(r), write) :: Nil
+      AppendDataExec(planLater(query), refreshCache(r), write, getCommand(r)) :: Nil


Is this for cases when MERGE is rewritten as INSERT? I thought we would skip populating metrics for appends, but let me think about it. What does Delta do when MERGE becomes INSERT?

oh I dont handle that yet.

...e/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala

Send DML metrics from job to V2Write

c17c6b9

szehon-ho changed the title ~~[SPARK-52689][SQL] Send DML Metrics to V2Write~~ [WIP] [SPARK-52689][SQL] Send DML Metrics to V2Write Jul 4, 2025

github-actions bot added the SQL label Jul 4, 2025

szehon-ho force-pushed the metric_to_write branch from 021dfb4 to 78235a6 Compare July 4, 2025 19:07

Fixes

3162af0

szehon-ho force-pushed the metric_to_write branch 3 times, most recently from 3fc94aa to de9d47d Compare July 12, 2025 00:19

szehon-ho changed the title ~~[WIP] [SPARK-52689][SQL] Send DML Metrics to V2Write~~ [SPARK-52689][SQL] Send DML Metrics to V2Write Jul 12, 2025

szehon-ho commented Jul 12, 2025

View reviewed changes

...core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala Outdated Show resolved Hide resolved

aokolnychyi reviewed Jul 15, 2025

View reviewed changes

sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/MergeMetrics.java Outdated Show resolved Hide resolved

aokolnychyi reviewed Jul 15, 2025

View reviewed changes

sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/BatchWrite.java Outdated Show resolved Hide resolved

aokolnychyi reviewed Jul 15, 2025

View reviewed changes

...e/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala Outdated Show resolved Hide resolved

aokolnychyi reviewed Jul 15, 2025

View reviewed changes

...e/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala Show resolved Hide resolved

Review comments

781f32d

szehon-ho force-pushed the metric_to_write branch from de9d47d to 781f32d Compare July 15, 2025 21:21

lint

8f97c64

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-52689][SQL] Send DML Metrics to V2Write #51377

[SPARK-52689][SQL] Send DML Metrics to V2Write #51377

Uh oh!

szehon-ho commented Jul 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

aokolnychyi Jul 15, 2025 •

edited

Loading

Uh oh!

szehon-ho Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

aokolnychyi Jul 15, 2025 •

edited

Loading

Uh oh!

szehon-ho Jul 15, 2025 •

edited

Loading

Uh oh!

aokolnychyi Jul 15, 2025

Uh oh!

szehon-ho Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[SPARK-52689][SQL] Send DML Metrics to V2Write #51377

Are you sure you want to change the base?

[SPARK-52689][SQL] Send DML Metrics to V2Write #51377

Uh oh!

Conversation

szehon-ho commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Uh oh!

aokolnychyi Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aokolnychyi Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

szehon-ho Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

szehon-ho commented Jul 4, 2025 •

edited

Loading

aokolnychyi Jul 15, 2025 •

edited

Loading

aokolnychyi Jul 15, 2025 •

edited

Loading

szehon-ho Jul 15, 2025 •

edited

Loading