Skip to content

Trace ptr_or_offset in jl_genericmemoryref_t #252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: mmtk-support-moving-upstream
Choose a base branch
from

Conversation

qinsoon
Copy link
Member

@qinsoon qinsoon commented Jun 24, 2025

@qinsoon qinsoon force-pushed the trace-ptr-or-offset branch from 4340bc9 to d6744a7 Compare June 24, 2025 07:23
@qinsoon qinsoon marked this pull request as ready for review June 24, 2025 08:15
@qinsoon qinsoon requested a review from udesou June 24, 2025 08:15
@udesou
Copy link
Contributor

udesou commented Jun 24, 2025

Did that actually fix the issue? I remember implementing something like that and that it didn't work. I believe what I did to reproduce the issue was running make test-atomics. If that test succeeds with your PR then that's a good indication that your fix works.
With max copying that is.

@udesou
Copy link
Contributor

udesou commented Jun 24, 2025

@qinsoon
Copy link
Member Author

qinsoon commented Jun 24, 2025

Did that actually fix the issue? I remember implementing something like that and that it didn't work. I believe what I did to reproduce the issue was running make test-atomics. If that test succeeds with your PR then that's a good indication that your fix works. With max copying that is.

I will try with atomics tests. I am not sure if this will fix the problem. But currently we are not tracing ptr_or_offset, which seems wrong.

@@ -363,6 +368,11 @@ pub unsafe fn scan_julia_object<SV: SlotVisitor<JuliaVMSlot>>(obj: Address, clos
return;
}

if (*vt).name == jl_genericmemoryref_typename {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-netto Is it possible that we're not tracing all jl_genericmemoryref_t objects with that check? I remember I had some code to simply print out if those objects were scanned when running the atomics test but for some reason nothing was being printed out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just more info for the discussion. In the runtime code, we can see that jl_genericmemoryref_t may be used as a value (rather than as a pointer to heap objects) -- the ref in jl_array_t is an example, there are a few functions that return jl_genericmemoryref_t as a value -- so objects of this type may be stored outside the heap, like stacks, global memory, native heap, etc. We should be fine with stacks. But if it appears in global memory, or native heap, or anywhere else, we would have an issue -- though I don't know if that happens or not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay. Will take a look soon.

@qinsoon
Copy link
Member Author

qinsoon commented Jun 24, 2025

I think these tests would always fail: https://github.com/mmtk/julia/blob/a3d4e393dc871b8dbe1f7d105ca9d0a9d98864b6/test/atomics.jl#L877

The test still failed. Will look more into it tomorrow.

@qinsoon
Copy link
Member Author

qinsoon commented Jun 27, 2025

I added a hack that deals with Ref{GenericMemoryRef{T}}, and the atomics tests now can pass. However, I don't think it is a good solution. See: #249 (comment)

@qinsoon
Copy link
Member Author

qinsoon commented Jul 3, 2025

Just as a note: in our dev branch mmtk-support-moving-upstream, the tracing for ptr_or_offset in GenericMemoryRef is wrong. The tracing code assumes ptr_or_offset is an internal pointer of mem, however, ptr_or_offset and mem may actually point to different objects. unsafe_wrap exposes this bug. The following code is a minimal example.

n = 15
a = zeros(Bool, n)
# a gets moved, but it is correct.
GC.gc()
# ptr_or_offset and mem for a8 do not point to the same object
a8 = unsafe_wrap(Array, Ptr{UInt8}(pointer(a)), length(a); own=false) # unsafely observe the actual bit patterns in `a`
# a8 gets moved, and its ptr_or_offset field does not get updated correctly by GC
GC.gc() 

println(a) # a is an array of 0s.
println(a8) # a8's pointer is dangling, and points to corrupted memory.

See 3587825 and fb3f50d.

@qinsoon
Copy link
Member Author

qinsoon commented Jul 3, 2025

fb3f50d changed the location of the forwarding pointer from -64 (bits) to 0. So at least it won't overwrite the object type. It still overwrites the object payload, and causes errors for size calculation. Currently we only trace internal pointers for a few limited number of types, so we are fine at the moment. But anyway, it is still a hack.

mmtk/mmtk-core#1331 needs to be resolved, then we can properly use find_object_from_internal_pointer during tracing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants