Skip to content

trace: implement supervisor components for tracing #4405

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 21, 2025

Conversation

nia-e
Copy link
Contributor

@nia-e nia-e commented Jun 18, 2025

Per #4326, this is the second of the main tracer bringup PRs. Let me know what needs fixing / changing / docs / justification / etc :D

@nia-e nia-e force-pushed the standalone-ptrace branch 4 times, most recently from 3826f24 to 079ce3d Compare June 19, 2025 13:47
@nia-e
Copy link
Contributor Author

nia-e commented Jun 19, 2025

Had a couple dumb bugs to iron out but I think this is good to go now.

@rustbot ready

@rustbot rustbot added the S-waiting-on-review Status: Waiting for a review to complete label Jun 19, 2025
@nia-e nia-e force-pushed the standalone-ptrace branch from 13e351f to ab547b1 Compare June 20, 2025 20:32
@oli-obk oli-obk added this pull request to the merge queue Jun 21, 2025
Merged via the queue into rust-lang:master with commit 18da864 Jun 21, 2025
8 checks passed
Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have a bunch of questions and comments. Some of them are resolved in #4418, but a bunch are clarification questions, so I'd appreciate a follow-up PR extending the comments here and possibly doing some refactoring.

Comment on lines +321 to +334
pub unsafe fn prepare_ffi(&mut self) -> Result<(), nix::errno::Errno> {
let prot = mman::ProtFlags::PROT_NONE;
unsafe { self.mprotect(prot) }
}

/// Deprotects all owned memory by setting it to RW. Erroring here is very
/// likely unrecoverable, so it may panic if applying those permissions
/// fails.
pub fn unprep_ffi(&mut self) {
let prot = mman::ProtFlags::PROT_READ | mman::ProtFlags::PROT_WRITE;
unsafe {
self.mprotect(prot).unwrap();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere this is called start_ffi and end_ffi, right?

Why do we need to give a list of pages to the trace mechanism if we then still do this work here in the allocator...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change the naming if that's preferrable, sure.

The point is that here in the allocator we're doing the initial and final bulk protection/deprotection to make it so accesses to the machine's memory result in a segfault; the list of pages passed to the supervisor is used to tell when a segfault is "real" (e.g. the C code wrote through a null ptr) or if not (that is, if its address matches any of these pages) we log it down instead. I guess we could delete this and make the supervisor process do the protection to the child process through those mempr_on()/mempr_off() functions, but that would be quite messy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could delete this and make the supervisor process do the protection to the child process through those mempr_on()/mempr_off() functions, but that would be quite messy

What I was thinking was more along the lines of having all mprotect calls reasonably close to each other -- right now, some are in native_lib and some in the allocator. That does not mean they have to be done by the supervisor. They can be done by the child in start_ffi / end_ffi, but with the logic in the tracing code, not here.

If you also change the pages getter to return an impl Iterator instead of a Vec, you can avoid some unnecessary allocation in end_ffi.

}

// It's fine / desirable behaviour for values to wrap here, we care about just
// preserving the bit pattern.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the bit pattern may change if we cast to a smaller type...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should clarify. These fields on libc::user_regs_struct are always usize/isize-sized, i.e. on x64 the type of self.rip is i64 while on x86 the equivalent self.eip is i32 (since these are defined as wobbly C types). What I meant by wrapping is just casting from signed to unsigned; this should never cause the size to decrease

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use cast_signed/cast_unsinged for casting between signed and unsigned, do not use as.

let push = addr..addr.strict_add(size);
// FIXME: This now has access type info in the latest
// git version of capstone because this pissed me off
// and I added it. Change this when it updates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-)

Comment on lines +533 to +539
// Overall structure:
// - Get the address that caused the segfault
// - Unprotect the memory
// - Step 1 instruction
// - Parse executed code to estimate size & type of access
// - Reprotect the memory
// - Continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And somewhere along the line we apparently use a fake stack for something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can document that here as well, sure

ptrace::setregs(pid, new_regs).unwrap();

// Our mempr_* functions end with a raise(SIGSTOP).
wait_for_signal(Some(pid), signal::SIGSTOP, true)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That true argument is not very self-explaining, please use a dedicated enum instead.

// won't do memory accesses and so can't trigger this!
let regs_bak = ptrace::getregs(pid).unwrap();
new_regs = regs_bak;
let ip_poststep = regs_bak.ip();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assert that ip_poststep is (a) bigger than ip_prestep and (b) not too much bigger, just to be safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll add that


// Now figure out the size + type of access and log it down
// This will mark down e.g. the same area being read multiple times,
// since it's more efficient to compress the accesses at the end.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we "compressing" anything...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is outdated, oops. I had a bold idea at one point of compressing down the list of which reads and writes happened in some fancy way until I realised it requires we always have perfect info on the size of accesses that never conservatively overestimates said size and I forgot to update this comment when I gutted that

// This will mark down e.g. the same area being read multiple times,
// since it's more efficient to compress the accesses at the end.
if capstone_disassemble(&instr, addr, cs, acc_events).is_err() {
// Read goes first because we need to be pessimistic.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is read-first "pessimistic"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also an artifact of the idea I mentioned above, I'll remove this comment. But in brief I wanted to be able to assert e.g. if a write happens to some area of memory whose bytes had a provenance associated with them, then we can delete said provenance and a later read to that area won't expose it, thus the read going first being "pessimistic". I'd like to keep the ordering though, just in case at some later point capstone improves enough / I find some way to implement that (maybe having some way to signal when we do know the exact size of accesses vs guessing?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the point of ordering, it's just confusing if it's not actually needed and also it's unnecessary work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I split it up and just have it return separate vectors for the reads and writes then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally it's one vector that has all events in-order. That is the simplest thing for the post-processing logic, and it can even help avoid unnecessary pointer exposure if C overwrites memory before reading it (meaning the old provenance stored there is definitely lost).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my idea too. The problem was that when we can't tell the size of the access (e.g. on aarch64) we default to the worst case size, so it might be possible that - say - a 4-byte write happens, we log it down as 8 bytes bc we don't know the real size, and in doing so accidentally delete those next 4 bytes' provenance info. I am considering adding an extra field to specify when we know for sure the exact size vs when we're conservatively overestimating, which would allow us to do this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... good point.
The general interface here is to include both a lower bound and an upper bound for the size of the access. Then the post-processing code can do the best possible approximation.

// And the fake stack it allocated for us to use later
_ch_stack = Some(ch_info.stack_ptr);
// All the pages that the child process is "allowed to" access.
ch_pages = ch_info.page_ptrs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this even work? Wasn't this allocated in the child after the fork, so the allocation holding this Vec's buffer only exists in the child now, not in the parent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I get it, this is properly (de)serialized over the IPC channel.

match risc_voperand {
arch::riscv::RiscVOperand::Mem(_) => {
// We get basically no info here.
let push = addr..addr.strict_add(size);
Copy link
Member

@RalfJung RalfJung Jun 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable size does not exist. Apparently this code did not even get build-tested. Please be more careful next time, this now causes a bunch of extra work to clean it all up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in other words, before we can re-land any riscv-specific code, we need CI set up for it. I have no idea how to set up CI that uses riscv as host architecture, probably that requires some sort of emulation.

We actually have the same problem with the arm code: it is not covered by CI at all, since we don't have a linux arm runner.

Copy link
Member

@RalfJung RalfJung Jun 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out the 32bit ARM code also does not build.

So yeah we definitely need a CI setup for every single target we have specific code for, everything else is not sustainable.

Copy link
Contributor Author

@nia-e nia-e Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ough. I had tested this on 64-bit arm, sorry for the inconvenience!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look to see if I can do anything about setting up CI, not really my area of expertise but I'll give it a shot

I spent most of Sunday setting that up. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Waiting for a review to complete
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants