-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Add WebAssembly overview aimed at JIT developers #120850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a new documentation file providing a conceptual and practical overview of WebAssembly (WASM) targeted at JIT developers.
- Introduces WASM motivations and design priorities.
- Explains core concepts (modules, memory, control flow, function tables, traps, SIMD, threading, async).
- Highlights areas relevant to JIT integration (module structure, signatures, control flow transformation).
|
||
Exception handling in WASM is relatively bare-bones. You define a try block with one or more catch clauses, where a given clause either catches a specific 'exception tag' or catches all exceptions (referred to as `catch_all`). Exception tags can be thought of conceptually like `Exception` or `ArgumentException` but in practice they are typically not used this way, and instead an entire language or compiler may use a single tag for its purposes - i.e. a `c++exception` tag which has an attached pointer into the linear memory where the real exception data lives. A given catch clause might then contain a series of type checks based on the data in linear memory. | ||
|
||
Emscripten currently mostly aligns with the the libc++ ABI (functions like `__cxa_begin_catch`) for exception handling. The best documentation I've found is at https://github.com/WebAssembly/tool-conventions/blob/main/EHScheme.md, and it appears to be derived from the Itanium C++ ABI. |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate word 'the the' should be reduced to a single 'the'.
Emscripten currently mostly aligns with the the libc++ ABI (functions like `__cxa_begin_catch`) for exception handling. The best documentation I've found is at https://github.com/WebAssembly/tool-conventions/blob/main/EHScheme.md, and it appears to be derived from the Itanium C++ ABI. | |
Emscripten currently mostly aligns with the libc++ ABI (functions like `__cxa_begin_catch`) for exception handling. The best documentation I've found is at https://github.com/WebAssembly/tool-conventions/blob/main/EHScheme.md, and it appears to be derived from the Itanium C++ ABI. |
Copilot uses AI. Check for mistakes.
|
||
## Origins of WebAssembly | ||
|
||
The key motivations behind WebAssembly were to provide consistent runtime performance in the browser with acceptable startup time and code size. Preceding technologies like [asm.js](https://developer.mozilla.org/en-US/docs/Games/Tools/asm.js) and [NaCL](https://en.wikipedia.org/wiki/Google_Native_Client) were attempts to solve the same problems with different upsides and downsides. Lessons from both fed into the development of WebAssembly (hereafter referred to as WASM). |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term 'NaCL' should be cased 'NaCl' to match the conventional capitalization of Google Native Client.
The key motivations behind WebAssembly were to provide consistent runtime performance in the browser with acceptable startup time and code size. Preceding technologies like [asm.js](https://developer.mozilla.org/en-US/docs/Games/Tools/asm.js) and [NaCL](https://en.wikipedia.org/wiki/Google_Native_Client) were attempts to solve the same problems with different upsides and downsides. Lessons from both fed into the development of WebAssembly (hereafter referred to as WASM). | |
The key motivations behind WebAssembly were to provide consistent runtime performance in the browser with acceptable startup time and code size. Preceding technologies like [asm.js](https://developer.mozilla.org/en-US/docs/Games/Tools/asm.js) and [NaCl](https://en.wikipedia.org/wiki/Google_Native_Client) were attempts to solve the same problems with different upsides and downsides. Lessons from both fed into the development of WebAssembly (hereafter referred to as WASM). |
Copilot uses AI. Check for mistakes.
|
||
### Fixed-width SIMD | ||
|
||
[The WASM SIMD extension](https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md) operates on 128-bit vectors with a lowest-common-denominator feature set that (generally) is efficient on x86, x86-64, arm32, arm64, and risc-v, and has well-defined behavior. There is [a 'relaxed SIMD' extension](https://github.com/WebAssembly/relaxed-simd/tree/main/proposals/relaxed-simd) that provides an expanded set of vector operations that have less consistent performance or may expose platform-specific undefined behavior. |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'risc-v' should be capitalized as 'RISC-V'.
[The WASM SIMD extension](https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md) operates on 128-bit vectors with a lowest-common-denominator feature set that (generally) is efficient on x86, x86-64, arm32, arm64, and risc-v, and has well-defined behavior. There is [a 'relaxed SIMD' extension](https://github.com/WebAssembly/relaxed-simd/tree/main/proposals/relaxed-simd) that provides an expanded set of vector operations that have less consistent performance or may expose platform-specific undefined behavior. | |
[The WASM SIMD extension](https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md) operates on 128-bit vectors with a lowest-common-denominator feature set that (generally) is efficient on x86, x86-64, arm32, arm64, and RISC-V, and has well-defined behavior. There is [a 'relaxed SIMD' extension](https://github.com/WebAssembly/relaxed-simd/tree/main/proposals/relaxed-simd) that provides an expanded set of vector operations that have less consistent performance or may expose platform-specific undefined behavior. |
Copilot uses AI. Check for mistakes.
### Shadow Stack | ||
|
||
The WASM VM implements a simple stack machine, where opcodes push and pop statically-typed values onto/off a shadow stack. The stack is strongly typed and has a known height at every location in a given function, and is contained by a given function. You cannot take the address of values on the shadow stack and the contents of a function's shadow stack cease to exist once it returns. A function's shadow stack is fixed size, comprised of its formal arguments and any 'locals' defined at compile time, numbered sequentially - i.e. a function might have a signature of `int (int, int)` and then define `8` `i32` locals along with 2 `f64` locals, in which case `0` and `1` would be the formal arguments and the rest would be the `i32` and `f64` locals. | ||
|
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] This conflates the operand (evaluation) stack with locals/parameters. In WASM the operand stack is dynamic (validated to have consistent stack effects) while locals and parameters are accessed by index and stored separately; its maximum depth is not described as a fixed-size 'shadow stack'. Suggest clarifying the distinction: operand stack vs locals, and note you can't directly address either but only locals are index-addressable.
### Shadow Stack | |
The WASM VM implements a simple stack machine, where opcodes push and pop statically-typed values onto/off a shadow stack. The stack is strongly typed and has a known height at every location in a given function, and is contained by a given function. You cannot take the address of values on the shadow stack and the contents of a function's shadow stack cease to exist once it returns. A function's shadow stack is fixed size, comprised of its formal arguments and any 'locals' defined at compile time, numbered sequentially - i.e. a function might have a signature of `int (int, int)` and then define `8` `i32` locals along with 2 `f64` locals, in which case `0` and `1` would be the formal arguments and the rest would be the `i32` and `f64` locals. | |
### Operand Stack and Locals | |
The WASM VM implements a simple stack machine model. Each function invocation has two separate storage areas: | |
- **Operand (evaluation) stack:** This is a dynamic, ephemeral stack used for evaluating expressions. Opcodes push and pop statically-typed values onto and off this stack. The operand stack is strongly typed and its depth is validated at compile time to ensure consistent stack effects, but its maximum depth is not fixed in the binary. The operand stack is not directly addressable and its contents exist only during the execution of a function. | |
- **Locals and parameters:** Each function has a fixed set of parameters and local variables, defined at compile time. These are stored separately from the operand stack and are accessed by index (e.g., `local.get 0`). You cannot take the address of a local or parameter, but you can read or write them by index. Locals and parameters persist for the duration of the function call. | |
For example, a function with signature `int (int, int)` and 8 `i32` locals plus 2 `f64` locals would have parameters at indices 0 and 1, and the remaining indices for the locals. |
Copilot uses AI. Check for mistakes.
|
||
Each host "thread" in practice has its own separate instance of the application module(s), and each instance has its own function table, global variables, and imports/exports. These threads then coordinate by sharing a single linear memory and using a mix of host imports (like a socket API) and atomics/fences. | ||
|
||
It is necessary for an application to ensure that any changes to the function table are synchronized between threads, and any global variable changes need to be manually synchronized between threads (either by storing them in shared linear memory, or via RPC). ⚠️ All WASM global variables are effectively TLS variables. As a result of each instance having its own function table, function pointers are effectively thread-local! ⚠️ |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The statement 'All WASM global variables are effectively TLS variables' may mislead; globals are per-module-instance and multiple threads can share an instance (e.g. via structured cloning or instantiation patterns) or intentionally instantiate separately. Clarify that per-instance isolation makes globals behave like thread-local only when each thread creates its own instance; when sharing an instance, globals are shared and need atomic synchronization if concurrently accessed.
It is necessary for an application to ensure that any changes to the function table are synchronized between threads, and any global variable changes need to be manually synchronized between threads (either by storing them in shared linear memory, or via RPC). ⚠️ All WASM global variables are effectively TLS variables. As a result of each instance having its own function table, function pointers are effectively thread-local! ⚠️ | |
It is necessary for an application to ensure that any changes to the function table are synchronized between threads, and any global variable changes need to be manually synchronized between threads (either by storing them in shared linear memory, or via RPC). ⚠️ WASM global variables are per-module-instance. If each thread creates its own instance, globals behave like thread-local storage (TLS). However, if multiple threads share a module instance (e.g., via structured cloning or certain instantiation patterns), globals are shared and concurrent access requires atomic synchronization. As a result of each instance having its own function table, function pointers are effectively thread-local only when instances are not shared! ⚠️ |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually true? It seems like it would violate concurrency/safety guarantees in the JS spec if it were true.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
- [Control Flow](#control-flow) | ||
- [Function Pointers](#function-pointers) | ||
- [Traps](#traps) | ||
- [Shadow Stack](#shadow-stack) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the term "shadow stack" is quite overloaded in WASM, I've seen it used to refer to both the linear memory stack and the WASM-VM-level stack (as used here). It may be best to use something less ambiguous (e. g. just "WASM stack"?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about terms like "linear memory stack" and "protected stack"?
|
||
## Introduction | ||
|
||
This document attempts to call out key things to know about WebAssembly (aka WASM) and explain the reasons for its existence. For more detail on WebAssembly, please consult the official specification: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document attempts to call out key things to know about WebAssembly (aka WASM) and explain the reasons for its existence. For more detail on WebAssembly, please consult the official specification: | |
This document attempts to call out key things to know about WebAssembly (aka Wasm) and explain the reasons for its existence. For more detail on WebAssembly, please consult the official specification: |
Nit: WebAssembly abbreviation casing is Wasm. (https://webassembly.org/)
No description provided.