trace: implement supervisor components for tracing #4405

nia-e · 2025-06-18T14:39:38Z

Per #4326, this is the second of the main tracer bringup PRs. Let me know what needs fixing / changing / docs / justification / etc :D

nia-e · 2025-06-19T13:49:11Z

Had a couple dumb bugs to iron out but I think this is good to go now.

@rustbot ready

src/shims/native_lib.rs

src/alloc/isolated_alloc.rs

src/shims/trace/parent.rs

Co-authored-by: Oli Scherer <github35764891676564198441@oli-obk.de>

RalfJung

I still have a bunch of questions and comments. Some of them are resolved in #4418, but a bunch are clarification questions, so I'd appreciate a follow-up PR extending the comments here and possibly doing some refactoring.

RalfJung · 2025-06-28T09:03:48Z

src/alloc/isolated_alloc.rs

+    pub unsafe fn prepare_ffi(&mut self) -> Result<(), nix::errno::Errno> {
+        let prot = mman::ProtFlags::PROT_NONE;
+        unsafe { self.mprotect(prot) }
+    }
+
+    /// Deprotects all owned memory by setting it to RW. Erroring here is very
+    /// likely unrecoverable, so it may panic if applying those permissions
+    /// fails.
+    pub fn unprep_ffi(&mut self) {
+        let prot = mman::ProtFlags::PROT_READ | mman::ProtFlags::PROT_WRITE;
+        unsafe {
+            self.mprotect(prot).unwrap();
+        }
+    }


Elsewhere this is called start_ffi and end_ffi, right?

Why do we need to give a list of pages to the trace mechanism if we then still do this work here in the allocator...?

I can change the naming if that's preferrable, sure.

The point is that here in the allocator we're doing the initial and final bulk protection/deprotection to make it so accesses to the machine's memory result in a segfault; the list of pages passed to the supervisor is used to tell when a segfault is "real" (e.g. the C code wrote through a null ptr) or if not (that is, if its address matches any of these pages) we log it down instead. I guess we could delete this and make the supervisor process do the protection to the child process through those mempr_on()/mempr_off() functions, but that would be quite messy

I guess we could delete this and make the supervisor process do the protection to the child process through those mempr_on()/mempr_off() functions, but that would be quite messy

What I was thinking was more along the lines of having all mprotect calls reasonably close to each other -- right now, some are in native_lib and some in the allocator. That does not mean they have to be done by the supervisor. They can be done by the child in start_ffi / end_ffi, but with the logic in the tracing code, not here.

If you also change the pages getter to return an impl Iterator instead of a Vec, you can avoid some unnecessary allocation in end_ffi.

RalfJung · 2025-06-28T09:04:44Z

src/shims/trace/parent.rs

+}
+
+// It's fine / desirable behaviour for values to wrap here, we care about just
+// preserving the bit pattern.


But the bit pattern may change if we cast to a smaller type...

I should clarify. These fields on libc::user_regs_struct are always usize/isize-sized, i.e. on x64 the type of self.rip is i64 while on x86 the equivalent self.eip is i32 (since these are defined as wobbly C types). What I meant by wrapping is just casting from signed to unsigned; this should never cause the size to decrease

Please use cast_signed/cast_unsinged for casting between signed and unsigned, do not use as.

RalfJung · 2025-06-28T09:07:51Z

src/shims/trace/parent.rs

+                            let push = addr..addr.strict_add(size);
+                            // FIXME: This now has access type info in the latest
+                            // git version of capstone because this pissed me off
+                            // and I added it. Change this when it updates.


RalfJung · 2025-06-28T09:08:53Z

src/shims/trace/parent.rs

+        // Overall structure:
+        // - Get the address that caused the segfault
+        // - Unprotect the memory
+        // - Step 1 instruction
+        // - Parse executed code to estimate size & type of access
+        // - Reprotect the memory
+        // - Continue


And somewhere along the line we apparently use a fake stack for something?

I can document that here as well, sure

RalfJung · 2025-06-28T09:10:17Z

src/shims/trace/parent.rs

+        ptrace::setregs(pid, new_regs).unwrap();
+
+        // Our mempr_* functions end with a raise(SIGSTOP).
+        wait_for_signal(Some(pid), signal::SIGSTOP, true)?;


That true argument is not very self-explaining, please use a dedicated enum instead.

RalfJung · 2025-06-28T09:11:15Z

src/shims/trace/parent.rs

+        // won't do memory accesses and so can't trigger this!
+        let regs_bak = ptrace::getregs(pid).unwrap();
+        new_regs = regs_bak;
+        let ip_poststep = regs_bak.ip();


Should we assert that ip_poststep is (a) bigger than ip_prestep and (b) not too much bigger, just to be safe?

Makes sense, I'll add that

RalfJung · 2025-06-28T09:11:52Z

src/shims/trace/parent.rs

+
+        // Now figure out the size + type of access and log it down
+        // This will mark down e.g. the same area being read multiple times,
+        // since it's more efficient to compress the accesses at the end.


Where are we "compressing" anything...?

Ah, this is outdated, oops. I had a bold idea at one point of compressing down the list of which reads and writes happened in some fancy way until I realised it requires we always have perfect info on the size of accesses that never conservatively overestimates said size and I forgot to update this comment when I gutted that

RalfJung · 2025-06-28T09:12:00Z

src/shims/trace/parent.rs

+        // This will mark down e.g. the same area being read multiple times,
+        // since it's more efficient to compress the accesses at the end.
+        if capstone_disassemble(&instr, addr, cs, acc_events).is_err() {
+            // Read goes first because we need to be pessimistic.


Why is read-first "pessimistic"?

Also an artifact of the idea I mentioned above, I'll remove this comment. But in brief I wanted to be able to assert e.g. if a write happens to some area of memory whose bytes had a provenance associated with them, then we can delete said provenance and a later read to that area won't expose it, thus the read going first being "pessimistic". I'd like to keep the ordering though, just in case at some later point capstone improves enough / I find some way to implement that (maybe having some way to signal when we do know the exact size of accesses vs guessing?)

I don't see the point of ordering, it's just confusing if it's not actually needed and also it's unnecessary work.

Should I split it up and just have it return separate vectors for the reads and writes then?

I think ideally it's one vector that has all events in-order. That is the simplest thing for the post-processing logic, and it can even help avoid unnecessary pointer exposure if C overwrites memory before reading it (meaning the old provenance stored there is definitely lost).

That was my idea too. The problem was that when we can't tell the size of the access (e.g. on aarch64) we default to the worst case size, so it might be possible that - say - a 4-byte write happens, we log it down as 8 bytes bc we don't know the real size, and in doing so accidentally delete those next 4 bytes' provenance info. I am considering adding an extra field to specify when we know for sure the exact size vs when we're conservatively overestimating, which would allow us to do this

Ah... good point.
The general interface here is to include both a lower bound and an upper bound for the size of the access. Then the post-processing code can do the best possible approximation.

RalfJung · 2025-06-28T09:33:09Z

src/shims/trace/parent.rs

-                // And the fake stack it allocated for us to use later
-                _ch_stack = Some(ch_info.stack_ptr);
+                // All the pages that the child process is "allowed to" access.
+                ch_pages = ch_info.page_ptrs;


Why does this even work? Wasn't this allocated in the child after the fork, so the allocation holding this Vec's buffer only exists in the child now, not in the parent?

Oh I get it, this is properly (de)serialized over the IPC channel.

RalfJung · 2025-06-28T21:55:22Z

src/shims/trace/parent.rs

+                    match risc_voperand {
+                        arch::riscv::RiscVOperand::Mem(_) => {
+                            // We get basically no info here.
+                            let push = addr..addr.strict_add(size);


The variable size does not exist. Apparently this code did not even get build-tested. Please be more careful next time, this now causes a bunch of extra work to clean it all up.

So, in other words, before we can re-land any riscv-specific code, we need CI set up for it. I have no idea how to set up CI that uses riscv as host architecture, probably that requires some sort of emulation.

We actually have the same problem with the arm code: it is not covered by CI at all, since we don't have a linux arm runner.

Turns out the 32bit ARM code also does not build.

So yeah we definitely need a CI setup for every single target we have specific code for, everything else is not sustainable.

Ough. I had tested this on 64-bit arm, sorry for the inconvenience!

I'll look to see if I can do anything about setting up CI, not really my area of expertise but I'll give it a shot

I spent most of Sunday setting that up. :)

nia-e force-pushed the standalone-ptrace branch 4 times, most recently from 3826f24 to 079ce3d Compare June 19, 2025 13:47

rustbot added the S-waiting-on-review Status: Waiting for a review to complete label Jun 19, 2025

oli-obk reviewed Jun 20, 2025

View reviewed changes

src/shims/native_lib.rs Outdated Show resolved Hide resolved

oli-obk reviewed Jun 20, 2025

View reviewed changes

src/alloc/isolated_alloc.rs Outdated Show resolved Hide resolved

oli-obk reviewed Jun 20, 2025

View reviewed changes

src/shims/trace/parent.rs Outdated Show resolved Hide resolved

oli-obk reviewed Jun 20, 2025

View reviewed changes

src/shims/trace/parent.rs Outdated Show resolved Hide resolved

nia-e and others added 6 commits June 20, 2025 22:14

supervisor bits of ffi ptracing

7c66038

fix dumb mistake

29ad423

cfg if

5944101

nonnulls

017cdc7

error rework

3ea29a5

Update src/shims/trace/parent.rs

ab547b1

Co-authored-by: Oli Scherer <github35764891676564198441@oli-obk.de>

nia-e force-pushed the standalone-ptrace branch from 13e351f to ab547b1 Compare June 20, 2025 20:32

fmt

6ccb6a7

oli-obk added this pull request to the merge queue Jun 21, 2025

Merged via the queue into rust-lang:master with commit 18da864 Jun 21, 2025
8 checks passed

RalfJung mentioned this pull request Jun 28, 2025

various minor native-lib-tracing tweaks #4418

Merged

RalfJung reviewed Jun 28, 2025

View reviewed changes

This was referenced Jun 28, 2025

Check that Miri builds on Linux ARM hosts #4419

Closed

native_lib/trace: fix and reenable #4435

Merged

trace: implement supervisor components for tracing #4405

trace: implement supervisor components for tracing #4405

Uh oh!

Conversation

nia-e commented Jun 18, 2025 • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nia-e commented Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RalfJung left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nia-e Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nia-e commented Jun 18, 2025 •

edited by rustbot

Loading

RalfJung Jun 28, 2025 •

edited

Loading

RalfJung Jun 29, 2025 •

edited

Loading

nia-e Jul 2, 2025 •

edited

Loading