Skip to content

Conversation

vicennial
Copy link
Contributor

@vicennial vicennial commented Oct 14, 2025

What changes were proposed in this pull request?

Close the streams and clear the stagedArtifacts buffer when it is no longer needed. Further, we explicitly attempt to close inner streams in case closing an outer stream (which generally cascades down) fails.

Why are the changes needed?

Upon flushing artifacts (or on an error), we delete the files on disk but the buffer is never cleared (and potentially open file handles), despite not being required anymore. An optimisation here is to clear them without relying on GC

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

Co-authored with assistance from Claude Code.

@vicennial
Copy link
Contributor Author

PTAL @hvanhovell

@vicennial vicennial changed the title [SPARK-53901][CONNECT] Clear stagedArtifacts buffer in SparkConnectAddArtifactsHandler upon flush/error [SPARK-53901][CONNECT] Handle potential memory leaks inSparkConnectAddArtifactsHandler upon flush/error Oct 14, 2025
// exception, we still attempt to close the inner streams to prevent resource leaks.
var closeException: Throwable = null
try {
checksumOut.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I file it commendable that you are trying to close the inner input streams, I think this is slightly over engineered. If you look at the implementation of both CountingOutputStream and CountingOutputStream, then you'll see that close is guaranteed to be called on their inner stream. So we don't really have to go through this dance. Alternatively you close the fileOut directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, updated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants