-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-52088][CORE] Refactor the ClosureCleaner#clean
and avoid using the -Djdk.reflect.useDirectMethodHandle=false
#51028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…set field in hidden class.
…set field in hidden class, but update the REPL instead
ClosureCleaner#clean
method and avoid using the -Djdk.reflect.useDirectMethodHandle=false
ClosureCleaner#clean
method and avoid using the -Djdk.reflect.useDirectMethodHandle=false
ClosureCleaner#clean
and avoid using the -Djdk.reflect.useDirectMethodHandle=false
@@ -25,7 +25,7 @@ on: | |||
java: | |||
required: false | |||
type: string | |||
default: 17 | |||
default: 21 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The modifications made to this file need to be reverted before merging.
field.set(outerThis, null) | ||
} catch { | ||
case _: Exception => | ||
// Ignore failures to set fields - this is a best-effort cleanup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logs should be added
field.set(outerThis, null) | ||
} catch { | ||
case _: Exception => | ||
// Ignore failures to set fields - this is a best-effort cleanup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
field.set(cmdInstance, null) | ||
} catch { | ||
case _: Exception => | ||
// Ignore failures to set fields - this is a best-effort cleanup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
cmdAccessedFields.contains(field.getName) || field.getName == "$outer" | ||
} | ||
|
||
for (field <- fieldsToNull) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that each modification has similar code. Can we extract a common function?
// But we can remove the `final` modifier of `outerField` before set value | ||
// and reset the modifier after set value. | ||
setFieldAndIgnoreModifiers(func, outerField, clonedOuterThis) | ||
val fieldsToNull = capturingClass.getDeclaredFields.filterNot { field => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @rednaxelafx FYI
It seems we've found a way to make all tests pass for Java 21 without using the -Djdk.reflect.useDirectMethodHandle=false
option (which is no longer effective after Java 22; for details, see JDK-8309635).
However, the new approach no longer involves cloning but instead operates on the original object. My understanding of this area isn't thorough. Do you have time to help review this PR? Thanks ~
also cc @cloud-fan @mridulm @srowen @JoshRosen @dongjoon-hyun @HyukjinKwon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this going to break applications - you capture a reference to object A and in order to serialize it you try trimming out some fields, but, this has the effect of breaking the object A in the REPL just because it's being copied for purposes of serializing a closure. Surely no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen Thanks for review!
emmm...I was thinking I need to save the original value of the reset-field, and set them back after the serialization. However, all UT passed, then I didn't do the recover. I can add a recovery if really necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're doing that, it seems easier to (shallow) copy the object as before. And safer. Is that possible?
Here's an example of what I'm worried about: a closure refers to class A. Class A has a reference to some non-serializable client object (that isn't needed by the closure). Here, you would null that reference, and then class A fails to work after this point in the REPL.
You can save and restore it, but, even then for a moment the object breaks in a really non-obvious way.
Do I misunderstand or is that a real risk? if so then I think this different approach, modifying user program objects, is too risky. Do we need to change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we consider a multi-threaded scenario? Is it possible to operate on object A in an intermediate state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen @LuciferYang
Multi-threaded will be bad, I thought the outerThis won't be accessed in other threads(?).
As to unsafe broken object, if there's no multi-thread issue, it should be fine. Because in SparkClosureCleaner, serialization is called immediately after the clean, which means no other action in between.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is always a gap between two actions, we can't guarantee it without locking. Can we copy the object instead of updating the original object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code should presumably have been executed on the copied object, right?
Previously, I tried updating fields in the copied object using approaches like VarHandle
and Unsafe.putObject
. However, due to the limitations imposed by hidden class and private final fields, all my attempts failed. We can continue to explore, but new ideas are needed.
What changes were proposed in this pull request?
During the cleanup process, instead of trying to update func.arg$1, we change the state in the arg$1.
This is due to Java safety constraint, it doesn't allow change private field in hidden class (the lambda).
Why are the changes needed?
since we're updating to Java22 finally. The workarounds (-Djdk.reflect.useDirectMethodHandle) won't exist any more
Does this PR introduce any user-facing change?
no
How was this patch tested?
UT
Was this patch authored or co-authored using generative AI tooling?
no