-
Notifications
You must be signed in to change notification settings - Fork 59
Add unstructured, typed IR for AOT compiler #596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I want to do an in-depth review as the PR is on the large side, give me a little time to come to it 🙏 I just skimmed through and I have a first question, the A note that we are reviewing the public API, and it's possible that we will create git conflicts, sorry for the noise. |
|
I agree they are similar. We could possibly move the type annotations to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, I finally took the time to go through the code.
It's really great work @electrum 👍
I like the direction of those changes, everything becomes a bit easier to understand.
At the same time, there are a couple of details that might be cleaner with a little "design doc", such as:
- what's
GOTOand the expected semantics - etc.
I see that this code structure bring us closer to fix the issues we have in the AOT, but I would feel more relaxed about it looking at a "target" PR, where the dots are connected and we can actually test out the those fixes.
This approach will confirm that we are not missing necessary semantics.
| } | ||
|
|
||
| // implicit block for the function | ||
| stack.enterScope(FUNCTION_SCOPE, FunctionType.of(List.of(), functionType.returns())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we discarding functionType.params()?
This code seems to make use of them:
https://github.com/dylibso/chicory/pull/596/files#diff-89b97bafcb107230b3864b1fc861e71f0eef8784e78fed7987356111d89f632cR634
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good observation, but this is actually a "block type" rather than a "function type". The parameters for a block are on the stack when the block is entered. The function's implicit block is different, as function arguments are in locals rather than on the stack. Thus the stack is empty at the start of the function.
| return Optional.of(new AotInstruction(AotOpCode.DROP_KEEP, operands.build().toArray())); | ||
| } | ||
|
|
||
| private FunctionType blockType(Instruction ins) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blockType is used also for LOOPs which is using the params instead of the returns, am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how block types are encoded per the specification:
A structured instruction can consume input and produce output on the operand stack according to its annotated block type. It is given either as a type index that refers to a suitable function type, or as an optional value type inline, which is a shorthand for the function type
[] -> [valuetype?].
| exitBlockDepth = ins.depth(); | ||
| if (ins.labelTrue() < idx) { | ||
| case GOTO: | ||
| if (visitedTargets.contains(ins.operand(0))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we skipping the interruption check here?
Can we document what's the rationale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a comment above that I hoped would explain it
// track targets to detect backward jumps
Set<Long> visitedTargets = new HashSet<>();If the target label has already been visited, then this is a backward jump. Otherwise, it is a forward jump. We document this in InterpreterMachine.checkinterruption():
Terminate WASM execution if requested. This is called at the start of each call and at any potentially backwards branches. Forward branches and other non-branch instructions are not checked, as the execution will run until it eventually reaches a termination point.
Should we reference this comment in AotMethods.checkInterruption()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we reference this comment in
AotMethods.checkInterruption()?
yes please 🙏
|
|
||
| enum AotOpCode { | ||
| LABEL, | ||
| DROP_KEEP, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's DROP_KEEP? What's the semantics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WABT IR has the same instruction. I've been thinking that UNWIND might be a better name as that describes the purpose and usage. The semantics can be seen in AotEmitters.DROP_KEEP(). What it does is to drop x number of intermediate stack values, keeping the last y values on the stack. This is how unwinds work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, we definitely need to preserve this comment somewhere
| } | ||
|
|
||
| // drop intervening values | ||
| for (int i = keepStart - 1; i >= 1; i--) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a stack unwind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly, this is why I've been thinking to name the opcode UNWIND instead.
|
|
||
| public static func_0(ILcom/dylibso/chicory/runtime/Memory;Lcom/dylibso/chicory/runtime/Instance;)I | ||
| ILOAD 0 | ||
| INVOKESTATIC com/dylibso/chicory/$gen/CompiledMachine$AotMethods.checkInterruption ()V |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting change, do you think it's not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's not needed because none of the switch targets are a backward jump.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I'm a bit unsure here is that we are not running the InterruptionTest on the aot compiled code.
But I'd prefer to keep the ball rolling other than fussing over a detail that can be reverted at any time.
|
If you think this existing structure makes the compiler easier to understand and would like to merge it now, then I can create a design doc to help with understanding it. Otherwise, we can wait until the method splitting work is ready, which will make some changes to it. I looked more at sharing the stack analyzer code with the validator, but I don't think it's worthwhile. The instruction handling is based on the specification and won't ever change, so it's better to have two simple implementations than a shared one that's more complicated. |
Ok, agreed, otherwise it will generate a ton of rebasing work.
Please, 🙏 if possible, I'll prioritize the design doc over the full method splitting implementation, I'll be happy to have more people able to review and reason about the aot compiler. |
|
This is a corpus of work, I hope you don't mind if I squash everything together. |
The IR uses unstructured control flow (i.e. jump/goto) and includes types for
SELECT,DROPandLOCAL_TEE. It also directly encodes stack unwinding via theDROP_KEEPopcode. This simplifies JVM bytecode generation by extracting the control flow, stack unwinding, and stack/type tracking.While this simplifies the compiler (at the expense of more code), the goal is to make method splitting simpler to implement, as the IR is easier to analyze than raw WebAssembly.