State machines to stackalloc on the caller's stack #9512

TheLeftExit · 2025-07-04T09:00:12Z

TheLeftExit
Jul 4, 2025

Consider APIs like GetWindowTextW and GetWindowTextLengthW. To use them without allocations, you'd have to call both of them like so:

public static unsafe partial class Win32 {
    [LibraryImport("user32.dll")]
    public static partial int GetWindowTextW(nint hWnd, char* lpString, int nMaxCount);

    [LibraryImport("user32.dll")]
    public static partial int GetWindowTextLengthW(nint hWnd);

    public static unsafe void Main() {
        nint hWnd = 0x1000;
        var length = GetWindowTextLengthW(hWnd) + 1;
        var buffer = stackalloc char[length];
        GetWindowTextW(hWnd, buffer, length);
        var span = new ReadOnlySpan<char>(buffer, length);
        // and if we were to return the span - nope
    }
}

This takes a while, and the resulting span isn't easily usable if we were to implement a managed GetWindowText method - we would either have to drag the two-method semantic into each caller method recursively, or give up and construct a heap-allocated string. You'd want to have this functionality as a single method that both allocates a buffer and fills it, but this isn't possible because the callee cannot allocate memory on the caller's stack, at least not in plain C/C++/C#.

However, in C#, we're already using state machines to break a single method body into multiple compiled methods that get called in a specific order in the caller method's context, while the developer only sees one method and a few extra keywords. Consider IEnumerable yield-based methods, what they translate to, and how the foreach keyword manipulates methods in the resulting struct (see .NET lab).

What if we were to use a state machine to:

Allow the method to retrieve and yield the buffer length,
Allow the caller to allocate a buffer of the specified length,
Allow the method to populate the buffer.

Like so:

// Using `caller_stackalloc` to declare a method as a caller-stack-allocating state machine
public static caller_stackalloc ReadOnlySpan<char> GetLength() {
    nint hWnd = 0x1000;

    var length = GetWindowTextLengthW(hWnd) + 1;
    // Addition two: 'caller_stackalloc' keyword
    // We yield the `length` argument, receive the pointer from the caller, then proceed as if we use regular `stackalloc`
    var buffer = caller_stackalloc char[length];
    GetWindowTextW(hWnd, buffer, length);
}

public static void Main() {
    ReadOnlySpan<char> text = caller_stackalloc GetLength();
}

Which would be lowered to something like:

public struct Allocator {
    public int GetLength() {
        nint hWnd = 0x1000;
        var length = Win32.GetWindowTextLengthW(hWnd) + 1;
        return length; // caller_stackalloc: yield
    }

    public void FillBuffer(char* buffer) { // caller_stackalloc: continue
        GetWindowTextW(0x1000, buffer, GetLength());
    }
}

public static void MainLowered() {
    var allocator = new Allocator();
    var length = allocator.GetLength();
    var buffer = stackalloc char[length];
    allocator.FillBuffer(buffer);
}

// The keywords/syntax/lowering are only for demonstration purposes, and are not a part of the proposal - they only demonstrate the idea

This code sample omits state persistence in the allocator struct, and the fact that we will most likely have to return a dedicated "awaiter"-like struct rather than a ReadOnlySpan directly. However, it still demonstrates that easy-to-use, allocation-free caller-stack-allocation and transition of spans is possible as a language feature, with little impact on the runtime.

I think this would be an amazing tool for native interop and performance-critical scenarios, and it would help with adoption of spans in scenarios where they're relevant, but cumbersome to use or even considered as overengineering.

Answered by tannergooding

Jul 11, 2025

Such code isn't a "correct" implementation and is prone to stack overflow for things like edit controls which can have text longer than what is "safe" to stack allocate.

Beyond that, there isn't really a way to define the state machines (even with ref structs) in a way that allows the scoping and other lifetime requirements of the stack allocation to be properly tracked. It might be possible for the language to add such support, but it would be very complex for a very niche scenario where the alternative results in code that is overall safer, more maintainable, more robust, and more performant.

Such an API should likely be exposed to the consumer as is, with a GetTextLength and a TryGetTe…

View full answer

Flutterish · 2025-07-04T10:03:20Z

Flutterish
Jul 4, 2025

Ive seen methods like this implemented as int fillBuffer(void* data) - when data is null, this returns a length, and when it isnt, it fills the data. You use it like so:

var buffer = stackalloc char[fillBuffer(null)];
fillBuffer(buffer);

Another way that I personally think is better than messing with the stack which can potentially overflow, is to just use a "scratch space" arena allocator (bump a pointer to allocate, reset it to deallocate) that you pass as an argument. In its simplest form it just looks like this:

void* scratch = ... // allocate some memory on program/thread init. maybe wrap it in some struct if you like

void doStuff(void* scratch) {
    var myText = getText(ref scratch); // ref so you observe the pointer bump
    ...
}

doStuff(scratch); // no ref so it "auto deallocates" after the call because you dont observe pointer bumps

0 replies

tannergooding · 2025-07-11T21:25:26Z

tannergooding
Jul 11, 2025
Collaborator

Such code isn't a "correct" implementation and is prone to stack overflow for things like edit controls which can have text longer than what is "safe" to stack allocate.

Beyond that, there isn't really a way to define the state machines (even with ref structs) in a way that allows the scoping and other lifetime requirements of the stack allocation to be properly tracked. It might be possible for the language to add such support, but it would be very complex for a very niche scenario where the alternative results in code that is overall safer, more maintainable, more robust, and more performant.

Such an API should likely be exposed to the consumer as is, with a GetTextLength and a TryGetText(Span<char> destination, out int charsWritten) shape, which is similar to how many APIs in the core libraries deal with filling spans. You can optionally provide a convenience allocating method that returns a string if desired, but it will always allocate the length given by the API.

2 replies

TheLeftExit Jul 13, 2025
Author

Fair enough. From what I understand, there are two concerns: the burden of implementing a robust state machine for this, and the potential for stack overflowing footguns.

Do you think that if the proposal dropped the "state machine" aspect and moved toward an "interpolated string handler"-like concept, it would have a shot? Like, for instance, if a struct implements int GetLength() and void FillSpan(Span<T>), the developer would be able to call stackalloc MyClass.SomeAPIThatReturnsAboveStruct() and have it compiled to span = stackalloc T[GetLength()]; FillSpan(span). Or are all ideas that involve stack allocation without explicitly checking the length considered too dangerous?

tannergooding Jul 13, 2025
Collaborator

Considered too dangerous.

It is also generally not recommended to do dynamic stack allocation where the length is unknown, even if it is always going to be in a safe range (approx 1KB of space used is considered safe; so 1024 bytes or 256 ints, etc).

This is due to the pessimizations that have to be done to reserve that stack space due to how the stack works.

The stack is intentionally designed (in hardware, this is not .net specific) to be small. There are hardware level optimizations that work assuming it will be small and that data on the stack for a method will be relatively “local”. It is also assumed to not rapidly grow and doesn’t actually exist all at once for security reasons (there are guard pages for the stack that hasn’t been touched yet and they fault if accessed out of order)

Using the stack arbitrarily can and will hurt perf for real world scenarios, even in cases where a microbenchmark may show improved performance. It instead needs to be used intentionally and with the scoped scenarios where specific explicit and considered usage can help avoid unnecessary extra work with small data.

Such code that deals with arbitrary lengths should instead be checking if the length is over some threshold and allocating an array (or renting from the array pool) of it is over that threshold. It should use span to allow working with the array or stackalloc without needing two code paths

It is more code, but it is also the right thing to be doing for perf and ensuring your app is robust, won’t crash, etc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

State machines to stackalloc on the caller's stack #9512

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

State machines to stackalloc on the caller's stack #9512

Uh oh!

TheLeftExit Jul 4, 2025

Replies: 2 comments · 2 replies

Uh oh!

Flutterish Jul 4, 2025

Uh oh!

Uh oh!

tannergooding Jul 11, 2025 Collaborator

Uh oh!

TheLeftExit Jul 13, 2025 Author

Uh oh!

tannergooding Jul 13, 2025 Collaborator

TheLeftExit
Jul 4, 2025

Replies: 2 comments 2 replies

Flutterish
Jul 4, 2025

tannergooding
Jul 11, 2025
Collaborator

TheLeftExit Jul 13, 2025
Author

tannergooding Jul 13, 2025
Collaborator