Skip to content

Conversation

spotandjake
Copy link
Member

@spotandjake spotandjake commented Oct 9, 2024

This improves the code generation for our pattern matching engine, the goal is to make it so binaryen can build br_tables in more cases.

This changes how we generate code for our wasm output generation so in certain cases when we are matching on a static constant such as a short type, or constructor as long as your data is continous i.e 'a', 'b', 'c' and we are not outputting a default (not grain's wildcard default but a default in the constant path of our decision tree) then we will use a jump table, in cases where the data is not continous and you have a default we perform the untagging first and do direct i32 equality checks which reduces code size a little and allows binaryen to optimize into a jump table in more cases). This lowers codegen our output size by a couple bytes in a few cases but more importantly on large constant matches it allows us todo a direct jump instead of a bunch of equality checks and it allows us to save on memory operations by getting the values of int32, int64 the uint equivlants and adt variants before our comparisions.

We could furthur improve codegen in a few ways, we could handle the default case this would require changing up compile_switch to accept a Default(anf_expression) instead of just Partial or Total which would allow us to optimize more cases into jump tables, additionally we could handle non continous datasets (i.e ones with gaps), by changing how we generate our jump_table slightly (I did not have a good heurtistic for this though and could not find good data on how many dead branches make sense vs not, as we just point the dead_branches in the gaps to the default branch).

closes: #1185

@spotandjake spotandjake force-pushed the spotandjake-optimize-patmatch branch from 2b43706 to 2a605c4 Compare January 14, 2025 21:58
Copy link
Member

@alex-snezhko alex-snezhko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Just had a question about the unoptimized case

| Const_uint64(i) => Some(Int64.to_int(i))
| Const_wasmi32(i) => Some(Int32.to_int(i))
| Const_wasmi64(i) => Some(Int64.to_int(i))
| _ => None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expand this out

(constructor_tag, nested_decision_tree) tuples,
with an optional default branch. */
Switch(
switch_type,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worthwhile to make this a record variant to give names to what all of these mean

let sub_op =
switch (value_typ) {
| WasmI32 =>
WasmBinaryI32({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can make these constants and share in other places where these are used e.g. translprim

Comment on lines +1194 to 1198
let (cur_value, _) =
switch (values) {
| [] => failwith("Impossible (compile_tree_help): Empty value stack")
| [hd, ...tl] => (hd, tl)
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just remove second value in tuple and return the head

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some more tests directly asserting the compiled optimized vs unoptimized output rather than just having the snapshots, and also some tests compiling and running grain programs with each scenario?

cur_value,
| ConstantSwitch(const) =>
switch (const) {
| Const_char(_) => (Prim1(UntagChar), WasmI32, comp_cond_i32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid repetition/add clarity you could just have the result here be a 2-tuple where the second indicates if i32 or i64 should be used since the only valid pairs are WasmI32, comp_cond_i32 or WasmI64, comp_cond_i64

cases,
);
let branch_count = List.length(cases);
let branch_range = Int.abs(max - min + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there an abs here? Won't this always be positive?

Comp.prim2(
~loc=Location.dummy_loc,
~allocation_type=Unmanaged(WasmI32),
equality_op,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the unoptimized case why did you replace the Is that was here before with the WasmBinaryXXX ops?

@spotandjake spotandjake marked this pull request as draft April 11, 2025 22:35
@spotandjake
Copy link
Member Author

I'm converting this to a draft as I would like to wait until the gc changes are all made to redo this, I think it can be done in a neater way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize pattern matching on constants

2 participants