-
Notifications
You must be signed in to change notification settings - Fork 390
trace: implement supervisor components for tracing #4405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3826f24
to
079ce3d
Compare
Had a couple dumb bugs to iron out but I think this is good to go now. @rustbot ready |
Co-authored-by: Oli Scherer <github35764891676564198441@oli-obk.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have a bunch of questions and comments. Some of them are resolved in #4418, but a bunch are clarification questions, so I'd appreciate a follow-up PR extending the comments here and possibly doing some refactoring.
pub unsafe fn prepare_ffi(&mut self) -> Result<(), nix::errno::Errno> { | ||
let prot = mman::ProtFlags::PROT_NONE; | ||
unsafe { self.mprotect(prot) } | ||
} | ||
|
||
/// Deprotects all owned memory by setting it to RW. Erroring here is very | ||
/// likely unrecoverable, so it may panic if applying those permissions | ||
/// fails. | ||
pub fn unprep_ffi(&mut self) { | ||
let prot = mman::ProtFlags::PROT_READ | mman::ProtFlags::PROT_WRITE; | ||
unsafe { | ||
self.mprotect(prot).unwrap(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere this is called start_ffi and end_ffi, right?
Why do we need to give a list of pages to the trace mechanism if we then still do this work here in the allocator...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change the naming if that's preferrable, sure.
The point is that here in the allocator we're doing the initial and final bulk protection/deprotection to make it so accesses to the machine's memory result in a segfault; the list of pages passed to the supervisor is used to tell when a segfault is "real" (e.g. the C code wrote through a null ptr) or if not (that is, if its address matches any of these pages) we log it down instead. I guess we could delete this and make the supervisor process do the protection to the child process through those mempr_on()
/mempr_off()
functions, but that would be quite messy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could delete this and make the supervisor process do the protection to the child process through those mempr_on()/mempr_off() functions, but that would be quite messy
What I was thinking was more along the lines of having all mprotect
calls reasonably close to each other -- right now, some are in native_lib and some in the allocator. That does not mean they have to be done by the supervisor. They can be done by the child in start_ffi / end_ffi, but with the logic in the tracing code, not here.
If you also change the pages getter to return an impl Iterator
instead of a Vec
, you can avoid some unnecessary allocation in end_ffi.
} | ||
|
||
// It's fine / desirable behaviour for values to wrap here, we care about just | ||
// preserving the bit pattern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the bit pattern may change if we cast to a smaller type...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should clarify. These fields on libc::user_regs_struct
are always usize
/isize
-sized, i.e. on x64 the type of self.rip
is i64
while on x86 the equivalent self.eip
is i32
(since these are defined as wobbly C types). What I meant by wrapping is just casting from signed to unsigned; this should never cause the size to decrease
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use cast_signed
/cast_unsinged
for casting between signed and unsigned, do not use as
.
let push = addr..addr.strict_add(size); | ||
// FIXME: This now has access type info in the latest | ||
// git version of capstone because this pissed me off | ||
// and I added it. Change this when it updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:-)
// Overall structure: | ||
// - Get the address that caused the segfault | ||
// - Unprotect the memory | ||
// - Step 1 instruction | ||
// - Parse executed code to estimate size & type of access | ||
// - Reprotect the memory | ||
// - Continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And somewhere along the line we apparently use a fake stack for something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can document that here as well, sure
ptrace::setregs(pid, new_regs).unwrap(); | ||
|
||
// Our mempr_* functions end with a raise(SIGSTOP). | ||
wait_for_signal(Some(pid), signal::SIGSTOP, true)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That true
argument is not very self-explaining, please use a dedicated enum instead.
// won't do memory accesses and so can't trigger this! | ||
let regs_bak = ptrace::getregs(pid).unwrap(); | ||
new_regs = regs_bak; | ||
let ip_poststep = regs_bak.ip(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we assert that ip_poststep is (a) bigger than ip_prestep and (b) not too much bigger, just to be safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I'll add that
|
||
// Now figure out the size + type of access and log it down | ||
// This will mark down e.g. the same area being read multiple times, | ||
// since it's more efficient to compress the accesses at the end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we "compressing" anything...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is outdated, oops. I had a bold idea at one point of compressing down the list of which reads and writes happened in some fancy way until I realised it requires we always have perfect info on the size of accesses that never conservatively overestimates said size and I forgot to update this comment when I gutted that
// This will mark down e.g. the same area being read multiple times, | ||
// since it's more efficient to compress the accesses at the end. | ||
if capstone_disassemble(&instr, addr, cs, acc_events).is_err() { | ||
// Read goes first because we need to be pessimistic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is read-first "pessimistic"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also an artifact of the idea I mentioned above, I'll remove this comment. But in brief I wanted to be able to assert e.g. if a write happens to some area of memory whose bytes had a provenance associated with them, then we can delete said provenance and a later read to that area won't expose it, thus the read going first being "pessimistic". I'd like to keep the ordering though, just in case at some later point capstone improves enough / I find some way to implement that (maybe having some way to signal when we do know the exact size of accesses vs guessing?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the point of ordering, it's just confusing if it's not actually needed and also it's unnecessary work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I split it up and just have it return separate vectors for the reads and writes then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ideally it's one vector that has all events in-order. That is the simplest thing for the post-processing logic, and it can even help avoid unnecessary pointer exposure if C overwrites memory before reading it (meaning the old provenance stored there is definitely lost).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my idea too. The problem was that when we can't tell the size of the access (e.g. on aarch64) we default to the worst case size, so it might be possible that - say - a 4-byte write happens, we log it down as 8 bytes bc we don't know the real size, and in doing so accidentally delete those next 4 bytes' provenance info. I am considering adding an extra field to specify when we know for sure the exact size vs when we're conservatively overestimating, which would allow us to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... good point.
The general interface here is to include both a lower bound and an upper bound for the size of the access. Then the post-processing code can do the best possible approximation.
// And the fake stack it allocated for us to use later | ||
_ch_stack = Some(ch_info.stack_ptr); | ||
// All the pages that the child process is "allowed to" access. | ||
ch_pages = ch_info.page_ptrs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this even work? Wasn't this allocated in the child after the fork
, so the allocation holding this Vec
's buffer only exists in the child now, not in the parent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I get it, this is properly (de)serialized over the IPC channel.
match risc_voperand { | ||
arch::riscv::RiscVOperand::Mem(_) => { | ||
// We get basically no info here. | ||
let push = addr..addr.strict_add(size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable size
does not exist. Apparently this code did not even get build-tested. Please be more careful next time, this now causes a bunch of extra work to clean it all up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in other words, before we can re-land any riscv-specific code, we need CI set up for it. I have no idea how to set up CI that uses riscv as host architecture, probably that requires some sort of emulation.
We actually have the same problem with the arm code: it is not covered by CI at all, since we don't have a linux arm runner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out the 32bit ARM code also does not build.
So yeah we definitely need a CI setup for every single target we have specific code for, everything else is not sustainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ough. I had tested this on 64-bit arm, sorry for the inconvenience!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look to see if I can do anything about setting up CI, not really my area of expertise but I'll give it a shot
I spent most of Sunday setting that up. :)
Per #4326, this is the second of the main tracer bringup PRs. Let me know what needs fixing / changing / docs / justification / etc :D