Skip to content

Conversation

tzakian
Copy link
Contributor

@tzakian tzakian commented Jul 18, 2024

Description

This PR adds the initial trace format, and the implementation of this trace format in the VM. I've gone through the tracing format with most of you synchronously so won't write out all the details here (but I will take it on to write a spec for the trace format once this is done so other folks can use it when consuming this format). Happy to go through it at any point in real time.

Other TODO: right now the MoveValues only support serialize and not deserialize back into Rust so we can only push a trace out from Rust but not consume it. The next thing I'm working on right now is adding support for that, and it may be either an update to this PR, or an additional PR depending on how involved that is...

Tests and integration of this into the Move CLI (not the Sui CLI yet) is in the PR above this one.

This keeps the tracing largely feature-gated in the VM, and the only additional overhead/change at runtime with gas-profiling turned off is the additional argument, but this argument is unused and should be optimized away (and at worst only add essentially no overhead).

I kept the gas-profiling feature flag and gated the new tracing under it. The plan being to eventually rename that flag at the same time that we update test coverage and gas profiling to use this new tracing format as well (but that's for a couple future PRs!).

PR Stack:

  1. [move] Enable serialization of source maps into json #18727
  2. [move] Initial trace format and implementation #18729 <<<<< You are here
  3. [move] Tracing tests and plumbing into the Move CLI (_not_ Sui CLI yet!) #18730

Test plan

PR above this one


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:
  • REST API:

@tzakian tzakian requested review from a team, amnn, awelc, cgswords and dariorussi July 18, 2024 21:30
Copy link

vercel bot commented Jul 18, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 11, 2024 6:11pm
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Sep 11, 2024 6:11pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Sep 11, 2024 6:11pm
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Sep 11, 2024 6:11pm

}
}

#[cfg(feature = "gas-profiler")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for feature gating these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple reasons:

  1. They are only used for move tracing, and in general we should be careful about when and how we create annotated values since they can inflate the value size.
  2. This is snapshotting the value to an annotated value -- in particular it goes through references. Whereas in a "true" VM value -> annotated value function you may want this to be an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to get this down as a doc comment, either on the feature gate or on as_annotated_move_value.

Comment on lines +49 to +50
pub(crate) struct VMTracer<'a> {
trace: &'a mut MoveTraceBuilder,
Copy link
Contributor

@cgswords cgswords Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a compelling reason to borrow this value, instead of having the tracer hold and then return it at the end of the execution?

Copy link
Contributor Author

@tzakian tzakian Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main reason is passing it by value would make the interface for this and the VM more cumbersome since this would now affect the return types of the VM as well as input types (and call sites etc), and this feels more natural to me/descriptive of "who owns the tracer" (the caller)

// Copyright (c) The Move Contributors
// SPDX-License-Identifier: Apache-2.0

#[cfg(not(feature = "gas-profiler"))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to rename this feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes absolutely! But I propose this be done in a future PR, since this will be taking over the gas profiler functionality (and this is a publicly visible/usable feature flag that people use when installing the Sui CLI) so we'll need to be careful on how we rename it.

Comment on lines 17 to 18
#[cfg(feature = "gas-profiler")]
$tracer.as_mut().map(|tracer| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of CFGing all of these, something you might consider is the following code:

#[cfg(feature = "gas-profiler")]
const TRACING_MODE: bool = true;

#[cfg(not(feature = "gas-profiler"))]
const TRACING_MODE: bool = false;

Then you can eliminate these macro calls and replace them with:

if tracing2::TRACING_MODE {
  tracer.as_mut.map(|tracer| ...)
}

Rust will eliminate these blocks even when compiled in debug mode.

In addition, you can even pass the tracer as Option<&VMTracer> to avoid the as_mut/map chains in these macros.

Copy link
Contributor

@cgswords cgswords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. Most things look good, aside from some name change requests. I'm not a big fan of the macro + cfg gate approach, but maybe there isn't something particularly better to do.

Copy link
Contributor

@amnn amnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realisd I had some comments saved up! Nothing major, mostly asking for clarification in various places (i.e. doc comments).

}
}

#[cfg(feature = "gas-profiler")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to get this down as a doc comment, either on the feature gate or on as_annotated_move_value.

pub fn into_tag_with_refs(&self) -> TypeTagWithRefs {
TypeTagWithRefs {
type_: self.tag.clone(),
ref_type: self.layout.as_ref().and_then(|(_, r)| r.clone()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that self.layout could be None because the layout could not be calculated -- this code merges that case with the case where the type is taken by value -- is that always safe to do? Can you add some docs here as to why that's safe to do if it is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout! Updated this so that the cases are not merged anymore.

Copy link
Contributor

@dariorussi dariorussi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am finishing tracer.rs and I should be done later today, meanwhile some feedback...

I have to say that I find it a bit difficult to digest how much we are sprinkling #[cfg(feature = "gas-profiler")] all over the code base.
I have not tried or thought enough but wonder if we could re-arrange things in a way that makes that more isolated/encapsulated?
Maybe removing the current profile macros and put everything inside functions?

errors::*,
file_format::{Constant, SignatureToken, VariantTag},
};
#[cfg(feature = "gas-profiler")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to move all this implementation in tracer.rs?
I guess ValueImpl being "private" requires us to do so?
It seems very intrusive to have all this code here if it's going to be used by the tracer only

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be less bad if the feature gate was not gas-profiler, but something else that was specific to this crate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this is needed here since ValueImpl is private. In the future (i.e., after the most recent VM meeting) I think this will eventually all be able to live in a single crate :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are making me wonder now if I am missing something...
this is not adding any error handling that was not there before, right?

pub(crate) struct Interpreter {
/// Operand stack, where Move `Value`s are stored for stack operations.
operand_stack: Stack,
pub(crate) operand_stack: Stack,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not that all those pub(crate) are problematic but man....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea...But needed!

)
});
#[cfg(feature = "gas-profiler")]
let func_clone = func.clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason for this and other clone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an existing clone for profile_open_frame. I think that can take it by ref, but I'm kinda of the opinion of keeping it as-is as it should be going away soon either way...

/// Internal state for the tracer. This is where the actual tracing logic is implemented.
pub(crate) struct VMTracer<'a> {
trace: &'a mut MoveTraceBuilder,
link_context: Option<AccountAddress>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we are exposing this, can we get rid of it? what is that supposed to do? is that what we want to keep moving forward?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this in the tracer so that can get the correct module to e.g., resolve type layouts.

In the future, this can be replaced by the mapping that we get from the link context and/or the resolver over the package vtables. But at least for now/in the current incarnation of the VM here we need this. (note that this isn't exposed in the trace or anything though!)

trace: &'a mut MoveTraceBuilder,
link_context: Option<AccountAddress>,
pc: Option<u16>,
active_frames: BTreeMap<TraceIndex, FrameInfo>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wonder if a Vec<(TraceIndex, FrameInfo)> wouldn't be more efficient and more obvious.
In fact we could even move the TraceIndex in the FrameInfo? then it would just be Vec<FrameInfo>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is sparse (this is keyed by TraceIndex and not e.g., call stack depth or anything like that), so moving to a Vec<...> is probably not a good idea here...

Copy link
Contributor

@dariorussi dariorussi Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I did not understand this too well :) and here is what I thought:
BTreeMap<TraceIndex, FrameInfo> is acting as a stack whenever we get the latest value (top of the stack, last_key_value) and it works because TraceIndex is guaranteed to be monotonically growing and as such a "later" frame has always a higher value of a frame up the stack (previous frame).
However we also use TraceIndex as a key into the "stack" to get roots and such.
Is that the case?
If that was the case (if what I wrote is true), a Vec<(TraceIndex, FrameInfo)> or Vec<FrameInfo> - if FrameInfo had the TraceIndex - would work as well with a binary search in place of all the get in the code. All the usage of a stack (last_key_value) would simply work off a vector which is more natural.
In other words (again assuming I understood anything) the perf characteristics of a vector with a binary search would be the same of the BTreeMap for the get(log(n)) and better otherwise in terms of space and time.

All in all at the end is not a big deal in terms of perf, as call stacks are not really that deep, and maybe better when reading the code?

Did I got the usage right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You got the usage right!

I think it's better to keep it as a BTreeMap as that feels more "natural" to me at least when reading/reasoning about the code since you generally need random access into the stack more frequently then most recent call frame (mainly due to mutating references). Agreed that it's not that big a deal either way though (and we can convert at a later date if we want/need to).

If we wanted to we could also move to an IndexMap for this as well which would give the ideal access patterns for this -- O(1) random access and O(1) "last frame"/call-stack order operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine keeping things as they are and I still think a vector would be the ideal struct - lol

Copy link
Contributor

@dariorussi dariorussi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more nits...


/// The reference information for a local. This is used to track the state of a local in a frame.
/// * It can be a value, in which case the reference type is `Value`.
/// * It can be a local that does not currently hold a value (is "empty"), in which case the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"in which case the" remove "the"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest how do you end up with a reference to an empty local? Isn't that what the borrow checker is supposed to protect against?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! This is tracking the state of the local during the course of execution, so e.g., when we open a frame, there may be a local (of reference type) that starts out uninitialized, so this would show up as a local with a LocalType of LocalType { layout: ..., ref_type: ReferenceType::Empty(RefType::Imm). This then gets filled later on throughout execution.

Comment on lines +217 to +222
/// Return the current frame identifier. This is trace index of the frame and is used to
/// identify reference locations rooted higher up the call stack.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should any of those be a doc comment instead of just a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doc comments are fine? Can move to normal comments if you want, but figure this is a bit nicer.

errors::*,
file_format::{Constant, SignatureToken, VariantTag},
};
#[cfg(feature = "gas-profiler")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be less bad if the feature gate was not gas-profiler, but something else that was specific to this crate?

// NB: parameters are passed directly, so we just pop to make sure they aren't also
// left on the operand stack. For the initial call, these pops may (should) fail, but that
// is fine as we already have the values in the parameter list.
self.operand_stack.pop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (optional): This is a bit more verbose but for me it more directly conveys the idea that you care about maintaining the invariant on the operand stack, and less about the values coming out of it:

self.operand_stack.truncate(self.operand_stack.len().saturating_sub(frame.parameters.len())

Some(())
}

fn close_instruction_(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't fully figure out why it's helpful to split instruction handling into pre-effects and post-effects, vs having one big match statement that contains the abstract implementation of every instruction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Called at different times -- one is called before the execution of the instruction, and the other is called immediately after (to observe the effect of the instruction). Basically

call_pre_effect();
<vm executes instruction>
call_post_effect();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, is there a good example of an instruction where its post-effect relies on a change in state in the VM to get computed correctly?

@tzakian tzakian force-pushed the tzakian/tracing-1-source-maps-json branch from dd46d00 to 44904a8 Compare September 6, 2024 23:53
@tzakian tzakian force-pushed the tzakian/tracing-2-core-impl branch from 0c0a0fa to c6bc865 Compare September 6, 2024 23:53
@tzakian
Copy link
Contributor Author

tzakian commented Sep 6, 2024

Everything should be either responded to or addressed! Thanks for all the feedback and suggestions.

@dariorussi
Copy link
Contributor

left a reply and one more idiotic question :)
format.rs is what we serialize out in a trace file (if we save to trace files), is that correct?

@tzakian
Copy link
Contributor Author

tzakian commented Sep 9, 2024

Correct! That is what is serialized out :)

Copy link
Contributor

@dariorussi dariorussi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accepting since it has reviewed quite a bit and we concluded there are things we want to do in subsequent PRs.
I guess there is no reason to keep it open.
And if there are reason you can keep it open for a bit longer :)

…t!) (#18730)

## Description 

Plumbs tracing into the Move unit tests, and Move CLI (_not_ the Sui CLI
just yet!).

You can now run with `move test --trace-execution` or `move test
--trace-execution <dir>`. If the Move CLI binary was compiled with the
`gas-profiled` feature flag enabled this will result in a trace file for
each unit test either under `<move-pkg>/traces/` or under
`<move-pkg>/<dir>/`. Once we're happy with this, and we've kicked the
tires a bit more + moved over test coverage and gas profiling plumbing
this up into the Sui CLI is something we should do.


This additionally adds a new test to the `./tests.sh` script for
external crates to compile and run tracing tests with the `gas-profiler`
flag set.

### PR Stack:
1. #18727
2. #18729
3. #18730 **<<<<< You are here**


## Test plan 

Added a number of tests.

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
- [ ] REST API:
@tzakian tzakian merged commit a6f8f05 into tzakian/tracing-1-source-maps-json Sep 11, 2024
14 of 20 checks passed
@tzakian tzakian deleted the tzakian/tracing-2-core-impl branch September 11, 2024 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants