Ordering of elements on the stack #2059

bobbinth · 2025-08-07T20:52:30Z

bobbinth
Aug 7, 2025
Maintainer

Currently, the ordering of elements in Miden VM works as follows:

Elements in memory are laid out in little endian order. That is, if we have a word with elements [1, 2, 3, 4] in memory at address 0, then mem[0] == 1, mem[1] == 2 etc.
Elements on the stack are laid out in "stack order". That is, if we have a word with elements [1, 2, 3, 4] on the stack, then stack[0] == 4, stack[1] == 3 etc.

Why this approach was chosen

This approach was chosen to make the instruction set as internally consistent as possible (or at least that was the goal). Specifically, we wanted the following to be equivalent:

push.1.2.3.4

# and
push.1 push.2 push.3 push.4

------------------

const.A=[1, 2, 3, 4]
push.A

# and
push.1.2.3.4

------------------

padw mem_loadw.0

# and
mem_load.0 mem_load.1 mem_load.2 mem_load.3

# or the same but in a loop
# => [i = -1]
repeat.4
    add.1 dup
    # => [i+1, i+1]

    mem_load swap
    # => [i+1, mem[i]]
end
drop

------------------

adv_push.1 adv_push.1 adv_push.1 adv_push.1 

# and
adv_push.4

# and
padw adv_loadw

------------------

adv_push.8

# and
padw padw adv_pipe

There are probably some more of these that I'm forgetting, but the core of the issue is that if we put 4 elements onto the stack one-by-one and a word, we won't end up in the same order unless we reverse the elements of the word. Same applies for 8 elements and 2 words etc.

Issues with this approach

While this approach is (mostly) internally consistent, it does have some issues:

People frequently get confused with the notions that word-sized data on the stack is in "stack order". In my experience people get used to it over time, but it does present an initial point of confusing and a steeper learning curve.
We do still have some inconsistencies. For example, mem_storew.0 is not the same as mem_store.0 mem_store.1 mem_store.2 mem_store.3
WebAssembly treats the stack in "normal" order. For example, when loading a u128 value from memory onto the stack, it would place two u64 values in the following order: [val_lo, val_hi] (@bitwalker can correct me if I'm wrong).

The last point is especially annoying because it leaves us with the following options:

Don't support u128 values in Rust.
Create a u128 module in our stdlib that follows the WASM convention.
a. This would also imply that we should re-work our u64 module because it would be weird to have u128 module work one way and u64 work in a different way.
Re-work the VM so that data on the stack is also in little-endian order.

The first option is probably not an option - we do need to support u128 integers in Rust.

The second option is a bit annoying because how u64 and u128 modules work will be inconsistent with the rest of the VM (though, we already have such inconsistencies - e.g., #571) - but it would be pretty simple to implement.

The 3rd option is the "right way" to do it, but it is by far the hardest. Of the top of my head:

This will require changes to the semantics of a large number of instructions - e.g., mem_loadw, mem_storew, but also mem_stream, hperm, and most u32 instructions.
All procedures in stdlib and transaction kernel that use these instructions would need to be re-written.
This will also require changing the state order of our hash function (maybe we'll need to have RPO2).
It will result in some inconsistencies we tried to avoid initially - i.e., push.1.2.3.4 will be different from push.1 push.2 push.3 push.4.

This is a LOT of work - though, potentially still doable until we go to mainnet (after mainnet, it won't be doable).

Would love to here other's though on these options or on anything else I missed here.

bobbinth · 2025-08-08T01:14:14Z

bobbinth
Aug 8, 2025
Maintainer Author

Also, cc @otrho as I believe you've been dealing with some of this recently :)

1 reply

otrho Aug 8, 2025
Collaborator

Yep, I have a WIP branch porting the existing std::math::u64 module to assume little endian stack ordering, as an intrinsic module for the compiler. I'll then refactor the codegen to use it instead of the std library, and then use it for our i128 support.

So I guess that's option 2 but only in the compiler (for now). It wouldn't be terrible IMO to have a std::math::u128 which followed the big endian and stayed consistent with std::math::u64 and the current VM. It just wouldn't be used by the compiler due to the Wasm issue. Not ideal to have duplicated modules, but not terrible.

greenhat · 2025-08-08T05:43:07Z

greenhat
Aug 8, 2025
Collaborator

If we have resources, the third option sounds like the right thing to do.This will also break a lot of the code of our Pioneers. We need to take this into account before making a decision.

0 replies

greenhat · 2025-08-08T05:57:32Z

greenhat
Aug 8, 2025
Collaborator

Currently, the ordering of elements in Miden VM works as follows:

Elements in memory are laid out in little endian order. That is, if we have a word with elements [1, 2, 3, 4] in memory at address 0, then mem[0] == 1, mem[1] == 2 etc.

I'm a bit confused. Does it mean the first element (with value 1) is the least significant element in the word? Otherwise I'd call it a big-endian order.

https://en.wikipedia.org/wiki/Endianness#/media/File:32bit-Endianess.svg

2 replies

bobbinth Aug 8, 2025
Maintainer Author

Yeah, this can be tricky. The way I think about it is: if more significant bits are at higher memory addresses, that's little-endian (kind of like in the picture). So, if we think of a word as a single 256-bit integer laid out in little-endian order, then word[0] would be the list significant element.

bobbinth Aug 8, 2025
Maintainer Author

I believe that's how WASM treats 128-bit integers too - i.e., memory representation of a 128 bit integer would be [lo_u64, hi_u64]. But @bitwalker and @otrho can correct me if I'm wrong.

PhilippGackstatter · 2025-08-08T06:49:14Z

PhilippGackstatter
Aug 8, 2025
Collaborator

Overall, I find this hard to think about whenever I work on doing the same thing in Rust and in MASM; everything basically needs to be reversed. E.g. these are the equivalent operations in Rust and MASM:

miden_objects::Hasher::hash_elements(&[0, 1, 2, 3, 4, 5, 6, 7].map(Felt::new))

and

# pad the capacity of the hasher
padw
push.0.1.2.3
push.4.5.6.7
# => [[7, 6, 5, 4], [3, 2, 1, 0], [0, 0, 0, 0]]
hperm
# extract digest

So both 1) the words are reversed and 2) the elements within the words are reversed in the stack comment versus the way we write the slice in Rust. The main problem with this is just the way it is "displayed". What I don't understand is why our stack grows to the left. I think every other visualization of memory or stack I know of grows to the right, e.g. vec![1, 2, 3] places 1 at a lower index than 3, so the visualization matches the way it is accessed, worked with and thought about. That's not the case with the way we display stack state. If instead we were to invert our stack states, then the MASM stack would match the slice passed to hash_elements:

# => [[0, 0, 0, 0], [0, 1, 2, 3], [4, 5, 6, 7]]

Granted, this is not exactly the issue this discussion is about, but it felt related enough to bring it up here as well. Also, I'm sure there was some rationale for making stack comments the way they are now but I don't know that rationale, so if there's a good reason please let me know 🙂.

If we were to change both this stack state ordering and go with the third option, then the word order might still match, but not the felt order within the words, if I understand correctly, but maybe that's still better than having both word and felt order be reversed.

1 reply

PhilippGackstatter Sep 4, 2025
Collaborator

One of the pain points of the current approach is the interaction between Rust and MASM as described above. Another example of this is how we insert advice provider data in Rust and then interact with it in MASM.

We push it to the advice stack like this:

self.extend_stack([
    account.id().suffix(),
    account.id().prefix().as_felt(),
    ZERO,
    account.nonce(),
]);

and use adv_pipe in MASM to write it to memory. Then, when we want to implement get_account_id, we have this:

export.get_account_id
    padw exec.get_current_account_data_ptr add.ACCT_ID_AND_NONCE_OFFSET mem_loadw
    # => [nonce, 0, curr_acct_id_prefix, curr_acct_id_suffix]
    drop drop
    # => [curr_acct_id_prefix, curr_acct_id_suffix]
end

What we load via mem_loadw is reversed to how we put it on the stack in the advice provider, and that's the main pain point.

plafer · 2025-08-08T15:59:22Z

plafer
Aug 8, 2025
Collaborator

If I ignore the reality of the pain that such a change would incur, IMO the best ordering would be the opposite of what we have, mainly for the reason @PhilippGackstatter touched on.

Say I have the data [1, 2, 3, 4, 5, 6, 7, 8] stored at locations 100..108 (where 1 is stored at address 100). I want to load that data on the stack. What I would want things to look like is


padw mem_loadw.104
# (top) [5, 6, 7, 8] (bottom)

padw mem_loadw.100
# (top) [1, 2, 3, 4, 5, 6, 7, 8] (bottom)

# Then, process the data in order

So basically I load the data from end to beginning (which is intuitive because of my mental model that I'm working with a stack), after which the data looks nice and "in proper order" on the stack (with the first item sitting at the top). If I run that code today, I get [4, 3, 2, 1, 8, 7, 6, 5], which doesn't let me access the first element of my data directly, and looks incorrect.

Specifically, we wanted the following to be equivalent:

Aren't a bunch of these assembler-specific, and could be true irrespective of our choice of stack ordering? e.g. push.1.2.3.4 <-> push.1 push.2 push.3 push.4 is just an assembler decision of how to reduce push.1.2.3.4 to a set of instructions, could be made to mean whatever we document it to mean, and thus is orthogonal? Similarly for const.A=[1, 2, 3, 4]; push.A <-> push.1.2.3.4.

As for padw mem_loadw.0 vs mem_load.0 mem_load.1 mem_load.2 mem_load.3, I would expect those not to be equivalent, because my mental model is already that I'm working with a stack, and so mem_load.0 will find itself at the bottom of the stack (whereas as discussed above I'd expect mem_loadw.0 to place to first element of the word on top of the stack because that's typically what you want to access first).

0 replies

bitwalker · 2025-09-03T18:26:46Z

bitwalker
Sep 3, 2025
Collaborator

I've opened an issue to track possibly introducing a new reversew instruction here.

0 replies

Al-Kindi-0 · 2025-09-09T11:48:48Z

Al-Kindi-0
Sep 9, 2025
Collaborator

I will try to describe the problem we are considering through the formulation of a principle, call it the principle of Structural Preservation, which can be defined as the maintenance of data organization and semantic relationships when values move between different VM components (operand stack, advice stack, and linear memory). This principle ensures that compound data structures retain their internal organization and semantic meaning across all VM operations that transfer them across components.
Most of what follows has already been discussed and touched upon by everyone involved in the discussion, though I wanted to systematize things, first for my own understanding and second to make sure we are all on the same page.

1. Introduction and Motivation

1.1 The Problem

Miden VM operates with field elements as its base data type. However, many computational patterns require working with compound structures:

Words: 4-element groupings
Double-words: 8-element groupings
Multi-precision integers: Values like u64, u128, u256 that span multiple field elements

Currently, Miden VM exhibits structural inconsistencies when these compound values move between memory and stack. For example:

Memory layout: [a@0, b@1, c@2, d@3]
Stack layout after mem_loadw: [d, c, b, a] (d on top, elements reversed)

This inconsistency breaks what programmers would expect—the same logical data structure should maintain its internal organization regardless of where it resides in the VM.

1.2 Why Structural Preservation Matters

Conceptual Clarity: Developers should think about data structures, not memory layouts. When a word represents a 256-bit hash, its internal structure should remain consistent whether it's in memory, on the operand stack, or in the advice provider.

Cognitive Load: Related to the previous point, developers shouldn't need to maintain mental maps of different ordering conventions across VM components.

Correctness: Many cryptographic and arithmetic operations depend on specific conventions for element ordering. Structural inconsistencies can lead to subtle bugs, especially in multi-precision arithmetic where limb ordering is critical and especially when crossing boundaries demanding a change in ordering conventions.

2. Structural Preservation Principle

2.1 Core Definition

Structural Preservation: When a compound data structure moves between any two VM components (operand stack, advice stack, linear memory), its internal organization must remain invariant under a well-defined, consistent mapping.

2.2 Intuition and Implications

Intuitively, we can say that, similar to how we wouldn't expect a mem_load.0 to flip the order of the internal u32 limbs (where we think here of a field element as a compound data structure), we should also not expect that a mem_loadw.0 would flip the order of the field elements making up the compound data structure Word.

This has several implications:

2.2.1 Individual vs Structured Push Operations

We accept, positively, the mismatch between the behavior of:

push.1.2.3.4 (expanded to push.1 push.2 push.3 push.4) → leaves 4 on top of the stack
push_w.1.2.3.4 (structure-preserving word push) → leaves 1 on top of the stack

Similarly for double-words:

push.1.2.3.4.5.6.7.8 → leaves 8 on top of the stack
push_dw.1.2.3.4.5.6.7.8 (structure-preserving double-word push) → leaves 1 on top

Note: push_w and push_dw are structure-preserving operations that we can introduce (though similar functionality may already exist).

2.2.2 Advice Stack Operations

When moving between the advice stack and operand stack, we accept, positively, that:

adv_push.4 behaves differently than adv_loadw
adv_loadw is structure-preserving and composable with mem_storew.x which usually follows it

For double-word operations: adv_pipe should take the top 8 elements (i.e., double-word) of the advice stack [a1, ..., a8] (with a1 on top) and should push the double-word onto the operand stack so that the internal structure of the double-word is maintained and a1 remains on top.

2.2.3 Multi-Precision Arithmetic

Packing and unpacking of u32 operations (i.e., u32split and u32combine) should follow the same principles, namely:

The lower limb (LSB) should be on top
The higher limb (MSB) should be second from top

3. Structural Preservation in Practice

3.1 Operand Stack

Current Behavior:

push.1.2.3.4  → Stack: [4, 3, 2, 1] (4 on top)

Structure-Preserving Behavior:

push_w.1.2.3.4  → Stack: [1, 2, 3, 4] (1 on top, preserving word structure)

Rationale: When we push a word as a structured unit, the first element should be most accessible (on top), maintaining the logical ordering word[0], word[1], word[2], word[3].

3.2 Linear Memory

Layout:

Word at address 100: mem[100]=s₀, mem[101]=s₁, mem[102]=s₂, mem[103]=s₃

Structure-Preserving Operations:

mem_loadw.100  → Stack: [s₀, s₁, s₂, s₃] (s₀ on top, preserving order)
mem_storew.100 → Memory: [s₀, s₁, s₂, s₃] at [s₀@100, s₁@101, s₂@102, s₃@103]

3.3 Advice Stack

Structure-Preserving Operations:

Starting with advice stack layout:

Advice stack: [1, 2, 3, 4] (1 on top)

Operations:

adv_push.4     → Operand Stack: [4, 3, 2, 1] (4 on top, individual elements)
adv_loadw      → Operand Stack: [1, 2, 3, 4] (1 on top, preserving structure)

3.4 Cross-Component Consistency

Maintaining the above principle implies that logical positions should correspond to physical positions across all components:

Logical Word: [elem₀, elem₁, elem₂, elem₃]

Memory:       addr+0: elem₀
              addr+1: elem₁  
              addr+2: elem₂
              addr+3: elem₃

Stack:        top:    elem₀
              top-1:  elem₁
              top-2:  elem₂  
              top-3:  elem₃

Advice Stack: top:    elem₀
              top-1:  elem₁
              top-2:  elem₂
              top-3:  elem₃

4. Multi-Precision Arithmetic Considerations

4.1 LSB-Accessible Principle

For multi-precision integers, we adopt the LSB-Accessible Principle: the least significant limb should be most accessible (on top of stack).

Rationale:

Most arithmetic operations process from LSB to MSB
Carry propagation flows from lower to higher limbs
This matches conventional arithmetic library designs

4.2 Examples

u32_split Operation:

Input: felt (representing 64-bit value 0x02345678_9ABCDEF0)
u32_split → Stack: [0x9ABCDEF0, 0x02345678]  
                   [   lo_u32,    hi_u32   ] (lo on top)

u64 Operations:

Input: [a_lo_u32, a_hi_u32, b_lo_u32, b_hi_u32] (a_lo_u32 on top)
u64.add → Stack: [res_lo_u32, res_hi_u32] (res_lo_u32 on top)

Ext2Felt Operations (quadratic extension field elements):

Input: [a_0, a_1, b_0, b_1] (a_0 on top)
ext2add → Stack: [c_0, c_1] where c = a + b (c_0 on top)

5. Cryptographic Hash State Management

5.1 Hasher State Organization

Our cryptographic operations based on hperm work with different structures:

2 words (e.g., hmerge) - rate portion of the sponge state
1 double-word (e.g., hperm itself) - complete rate portion
1 additional word - capacity portion (should not interact with absorbed data except within permutation)

5.2 Stack Layout for Structure Preserving Hash Operations

Before hmerge call:

Stack layout:
[word_0[0], word_0[1], word_0[2], word_0[3], word_1[0], word_1[1], word_1[2], word_1[3], cap[0], cap[1], cap[2], cap[3]]
[----------- First half rate --------------, ---------- Second half rate ----------------, --------- Capacity ---------]
↑ (top)

Before hperm call:

Stack layout:
[dword[0], dword[1], dword[2], dword[3], dword[4], dword[5], dword[6], dword[7], cap[0], cap[1], cap[2], cap[3]]
[------------------------- Rate -----------------------------------------------, ----------- Capacity ---------]  
↑ (top)

Note: The LSB-Accessible Principle ensures smooth transitions between these layouts.

5.3 Memory Streaming Operations

Both adv_pipe and mem_stream operations should be modified to respect double-word structure:

Example with adv_pipe:

Initial State:

Memory (double-word at address 100): 
  mem[100]=s₀, mem[101]=s₁, mem[102]=s₂, mem[103]=s₃, 
  mem[104]=s₄, mem[105]=s₅, mem[106]=s₆, mem[107]=s₇

Operand stack: 
  [0, 0, 0, 0, 0, 0, 0, 0, cap[0], cap[1], cap[2], cap[3], ptr] (ptr == 100)
  ↑ (top)

Structure-Preserving adv_pipe Operation:

Result stack:
  [s₀, s₁, s₂, s₃, s₄, s₅, s₆, s₇, cap[0], cap[1], cap[2], cap[3], ptr] (ptr == 100)
  ↑ (top)

This ensures that the double-word maintains its internal structure when transferred from advice/memory to the operand stack, making it compatible with subsequent hperm operations.

6. Outlook

If the above, makes sense, and provided we haven't missed something fundamental, then I think there is a strong case of accepting the (significant) cost associated with the migration effort. The proposal of @bitwalker will help make the transition gradual and less painful, though my preference would be to get it out of the way as soon as possible.

0 replies

Ordering of elements on the stack #2059

Uh oh!

bobbinth Aug 7, 2025 Maintainer

Why this approach was chosen

Issues with this approach

Replies: 7 comments · 4 replies

Uh oh!

bobbinth Aug 8, 2025 Maintainer Author

Uh oh!

otrho Aug 8, 2025 Collaborator

Uh oh!

greenhat Aug 8, 2025 Collaborator

Uh oh!

greenhat Aug 8, 2025 Collaborator

Uh oh!

bobbinth Aug 8, 2025 Maintainer Author

Uh oh!

bobbinth Aug 8, 2025 Maintainer Author

Uh oh!

PhilippGackstatter Aug 8, 2025 Collaborator

Uh oh!

PhilippGackstatter Sep 4, 2025 Collaborator

Uh oh!

plafer Aug 8, 2025 Collaborator

Uh oh!

bitwalker Sep 3, 2025 Collaborator

Uh oh!

Al-Kindi-0 Sep 9, 2025 Collaborator

1. Introduction and Motivation

1.1 The Problem

1.2 Why Structural Preservation Matters

2. Structural Preservation Principle

2.1 Core Definition

2.2 Intuition and Implications

2.2.1 Individual vs Structured Push Operations

2.2.2 Advice Stack Operations

2.2.3 Multi-Precision Arithmetic

3. Structural Preservation in Practice

3.1 Operand Stack

3.2 Linear Memory

3.3 Advice Stack

3.4 Cross-Component Consistency

4. Multi-Precision Arithmetic Considerations

4.1 LSB-Accessible Principle

4.2 Examples

5. Cryptographic Hash State Management

5.1 Hasher State Organization

5.2 Stack Layout for Structure Preserving Hash Operations

5.3 Memory Streaming Operations

6. Outlook

bobbinth
Aug 7, 2025
Maintainer

Replies: 7 comments 4 replies

bobbinth
Aug 8, 2025
Maintainer Author

otrho Aug 8, 2025
Collaborator

greenhat
Aug 8, 2025
Collaborator

greenhat
Aug 8, 2025
Collaborator

bobbinth Aug 8, 2025
Maintainer Author

bobbinth Aug 8, 2025
Maintainer Author

PhilippGackstatter
Aug 8, 2025
Collaborator

PhilippGackstatter Sep 4, 2025
Collaborator

plafer
Aug 8, 2025
Collaborator

bitwalker
Sep 3, 2025
Collaborator

Al-Kindi-0
Sep 9, 2025
Collaborator