Skip to content

Conversation

@electrum
Copy link
Collaborator

@electrum electrum commented Oct 23, 2024

The IR uses unstructured control flow (i.e. jump/goto) and includes types for SELECT, DROP and LOCAL_TEE. It also directly encodes stack unwinding via the DROP_KEEP opcode. This simplifies JVM bytecode generation by extracting the control flow, stack unwinding, and stack/type tracking.

While this simplifies the compiler (at the expense of more code), the goal is to make method splitting simpler to implement, as the IR is easier to analyze than raw WebAssembly.

@electrum electrum requested a review from andreaTP October 23, 2024 13:26
@andreaTP
Copy link
Collaborator

I want to do an in-depth review as the PR is on the large side, give me a little time to come to it 🙏

I just skimmed through and I have a first question, the TypeStack and the Analyzer are doing a job very similar(in terms of structure) to the Validator now, wondering if we can re-use some code there, wdyt?
e.g. using a single implementation of the analyzeSimple methods seems doable, this will help keeping the changes isolated when implementing new opcodes.

A note that we are reviewing the public API, and it's possible that we will create git conflicts, sorry for the noise.

@electrum
Copy link
Collaborator Author

I agree they are similar. We could possibly move the type annotations to AnnotatedInstruction and have the validator attach those. But that might be mixing concerns and make the code more complicated.

Copy link
Collaborator

@andreaTP andreaTP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I finally took the time to go through the code.
It's really great work @electrum 👍

I like the direction of those changes, everything becomes a bit easier to understand.
At the same time, there are a couple of details that might be cleaner with a little "design doc", such as:

  • what's GOTO and the expected semantics
  • etc.

I see that this code structure bring us closer to fix the issues we have in the AOT, but I would feel more relaxed about it looking at a "target" PR, where the dots are connected and we can actually test out the those fixes.
This approach will confirm that we are not missing necessary semantics.

}

// implicit block for the function
stack.enterScope(FUNCTION_SCOPE, FunctionType.of(List.of(), functionType.returns()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@electrum electrum Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observation, but this is actually a "block type" rather than a "function type". The parameters for a block are on the stack when the block is entered. The function's implicit block is different, as function arguments are in locals rather than on the stack. Thus the stack is empty at the start of the function.

return Optional.of(new AotInstruction(AotOpCode.DROP_KEEP, operands.build().toArray()));
}

private FunctionType blockType(Instruction ins) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blockType is used also for LOOPs which is using the params instead of the returns, am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how block types are encoded per the specification:

A structured instruction can consume input and produce output on the operand stack according to its annotated block type. It is given either as a type index that refers to a suitable function type, or as an optional value type inline, which is a shorthand for the function type [] -> [valuetype?].

exitBlockDepth = ins.depth();
if (ins.labelTrue() < idx) {
case GOTO:
if (visitedTargets.contains(ins.operand(0))) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we skipping the interruption check here?
Can we document what's the rationale?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a comment above that I hoped would explain it

// track targets to detect backward jumps
Set<Long> visitedTargets = new HashSet<>();

If the target label has already been visited, then this is a backward jump. Otherwise, it is a forward jump. We document this in InterpreterMachine.checkinterruption():

Terminate WASM execution if requested. This is called at the start of each call and at any potentially backwards branches. Forward branches and other non-branch instructions are not checked, as the execution will run until it eventually reaches a termination point.

Should we reference this comment in AotMethods.checkInterruption()?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we reference this comment in AotMethods.checkInterruption()?

yes please 🙏


enum AotOpCode {
LABEL,
DROP_KEEP,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's DROP_KEEP? What's the semantics?

Copy link
Collaborator Author

@electrum electrum Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WABT IR has the same instruction. I've been thinking that UNWIND might be a better name as that describes the purpose and usage. The semantics can be seen in AotEmitters.DROP_KEEP(). What it does is to drop x number of intermediate stack values, keeping the last y values on the stack. This is how unwinds work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, we definitely need to preserve this comment somewhere

}

// drop intervening values
for (int i = keepStart - 1; i >= 1; i--) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a stack unwind?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly, this is why I've been thinking to name the opcode UNWIND instead.


public static func_0(ILcom/dylibso/chicory/runtime/Memory;Lcom/dylibso/chicory/runtime/Instance;)I
ILOAD 0
INVOKESTATIC com/dylibso/chicory/$gen/CompiledMachine$AotMethods.checkInterruption ()V
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting change, do you think it's not needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not needed because none of the switch targets are a backward jump.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I'm a bit unsure here is that we are not running the InterruptionTest on the aot compiled code.
But I'd prefer to keep the ball rolling other than fussing over a detail that can be reverted at any time.

@electrum
Copy link
Collaborator Author

If you think this existing structure makes the compiler easier to understand and would like to merge it now, then I can create a design doc to help with understanding it. Otherwise, we can wait until the method splitting work is ready, which will make some changes to it.

I looked more at sharing the stack analyzer code with the validator, but I don't think it's worthwhile. The instruction handling is based on the specification and won't ever change, so it's better to have two simple implementations than a shared one that's more complicated.

@andreaTP
Copy link
Collaborator

If you think this existing structure makes the compiler easier to understand and would like to merge it now

Ok, agreed, otherwise it will generate a ton of rebasing work.

I can create a design doc to help with understanding it

Please, 🙏 if possible, I'll prioritize the design doc over the full method splitting implementation, I'll be happy to have more people able to review and reason about the aot compiler.

@andreaTP
Copy link
Collaborator

This is a corpus of work, I hope you don't mind if I squash everything together.
E.g.: Move utility methods to AotUtil make sense in isolation, but it makes much more sense as part of the bigger refactoring.

@andreaTP andreaTP merged commit 72a65cf into dylibso:main Oct 29, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants