Projection ABI #1718

dabrahams · 2025-08-06T21:52:30Z

dabrahams
Aug 6, 2025
Maintainer

I wanted to write this all down in advance of our discussion Thursday so I wouldn't forget any of it, and so, hopefully, my concerns and idea will be clear before we start.

Here's what I'm after: I want the generalized projection model that supports ephemeral values to be a first class citizen. For example, I want to be able to satisfy standard library protocols like Collection with types that project ephemeral values. I note that slices are ephemeral. I want to be able to do generalized projection with a guarantee of no dynamic allocation, so that it is usable without a dynamically allocating runtime. I want Hylo to retain an efficient before-optimization compilation model.

I'm happy to bake "addressor-ness" into the ABI/API of specific types like Array and I think we should. I have no objection to any plan that gets the language functioning in the short term—this post is about long-term direction.

The current plan, which depends for basic efficiency on being able to know how much stack memory will be needed during any given projection, does not achieve what I'm after. Resilience boundaries, recursion, and existentials all create situations where the current plan cannot avoid dynamic allocation.

I am also concerned about the cases where we can potentially use addressors because it is my impression that:

Anything that depends on the compiler being able to reason about all the code down to the leaves, inevitably:
- fails when the source code reaches some level of complexity. If the alternative is orders of magnitude less efficient, then you have a performance cliff.
- slows down the compiler as the source code gets complex. It may not be acceptable to do enough reasoning to achieve basic efficiency in debug builds.
Once you encode a complex solution to a problem (e.g. one involving memory allocation) into the program it is harder to remove it in later optimization passes.

Admittedly these bullets are all impressions, and certainly Swift does the latter thing all the time. But I do have the impression that it's hard—partly from my experience working with the people who had to implement those things fro Swift.

I think I see how to avoid these problems using a variation of the higher-order function technique that we've often said in talks, not-quite-accurately, is equivalent to our subscripts. It goes like this:

For the purposes of this discussion, I'm going to define the concept of a "local lambda:" it's just a non-escaping function pointer bundled with a referene that allows access to the locals of some surrounding function.

A projection accessor is then a function that accepts a local lambda from its client. This lambda represents the client code until the end of the scope where the projection is initiated (the length can be narrowed by in many cases, but it never extends beyond the end of the scope). This lambda itself accepts another lambda representing the slide (the code in the projection accessor after the yield,) which the first lambda invokes when the lifetime of the projection ends. Later projections in the client scope cause a similar breaking up of the first local lambda into other local lambdas.

That's it. I hope this ends up being helpful. Thanks for reading.

-Dave

kyouko-taiga · 2025-08-07T03:56:52Z

kyouko-taiga
Aug 7, 2025
Maintainer

I'll also add a few notes here to keep as a memo. Those are not direct answers to nor a rebuttal of Dave's points.

Inversion of control using higher-order functions is not as expressive as subscripts

Consider this example:

subscript first(of x: {Int, Int}) -> Int {
  let stuff = something_huge()
  yield x.0
  print(stuff.part)
}

public fun main() {
  var p = (1, 2)
  let x = first[of: p]
  let y = first[of: p]
  if x > 0 { print(y) }
}

Here the last use of x is before the last use of y and yet its lifetime started before that of y. That means x and y do not have strictly nested timelines. That is in contrast with a naive translation using higher-order functions, which requires strictly nested timelines, which hinders potential optimization w.r.t. the release of resources acquired in a ramp.

It is probably not possible to predict the size of a stack frame in general

Consider this example:

trait P {
  subscript g() -> Int
}

subscript a<T is P>(t: T) -> Int { t.g[] }

There has been discussions of lowering subscripts so that there's a symbol next to the corresponding function that tells about the size of its frame. But it is not possible to compute such a size AoT here because it is function of t's conformance to P (which is another argument of the function in its lowered form).

Also, perhaps even more simply, I think we can't compute frame sizes for recursive subscripts.

It is only the memory acquired in the ramp and released in a slide that is problematic

If subscripts are lowered as a pair of functions (one slide and one ramp), then it is only the memory that must persists until the second function is called that is problematic (insofar as it may require dynamic allocation).

It is probably impossible to eliminate dynamic allocation completely

If we're gonna support existential containers, or any similar form of data abstraction, then I think there is no way to completely eliminate dynamic allocation from the language. If that is the case, then I believe we should provide users with ways to predict exactly when dynamic allocation will occur and teach them refactoring techniques to avoid them.

I am concerned about any compilation scheme relying heavily on higher-order-functions

It is my understanding that higher-order functions tend to obscure the code for optimizers, which hinders very important optimizations, like scalar replacement of aggregates and memory to register promotion. To illustrate, consider this silly example:

subscript front_and_back(of xs: Array<Int>) -> {Int, Int} {
  var ys = (xs.first(), ys.last())
  yield ys
  swap(&ys.0, &ys.1)
}

public fun main() {
  let xs: Array = [1, 2, 3]
  var zs: Array<Int> = []
  let ys = front_and_back[of: xs]
  zs.append(ys.1.copy())
  print(zs.first())
}

The important bits are that:

the subscript is forming an ephemeral value and it is modifying it in its slide;
main is using the projection to modify a local data structure more complex than a scalar.

Using what I understood to be Dave's suggestion, one issue I can see is that the local lambda that will be passed to the subscript will have to close over zs. As a result, it will prevent scalar replacement because zs will have to exist as an array (an aggregate) to be used by some other code that we presumably can't see. There is no way around this issue unless we allow ourselves arbitrary inlining, as Dave justifiable pointed out, is not possible in general.

Note that the same issue arises around the ephemeral value ys that initialized in front_and_back. However there's some hope that in some instances we may cleverly move instructions before a yield point as long as we can prove that it doesn't change the observable semantics (not the case here).

Anyways, I believe this issue suggests that there is no free lunch when it comes to optimizing subscripts without arbitrary inlining. Specifically we'll have to sacrifice some optimizations to abstract over the two parts of a subscript (the slide and the ramp). Whether or not it is better to come up with a model that guarantee freedom from dynamic allocation in general at the cost of other optimizations is a very interesting research question, and I clearly do not have the answer.

2 replies

dabrahams Aug 7, 2025
Maintainer Author

We can see the only code that can access zs. Maybe the problem you are citing is that existing optimizers will assume otherwise and there's no way to keep them from doing so. It does seems to me that Hylo (like Rust) will want to make all sorts of non-observability statements to the optimizer, and those features can be implemented.

kyouko-taiga Aug 7, 2025
Maintainer

The loss of the optimization opportunity isn't due to a limitation of LLVM and no amount of additional annotation will help. The only way to replace zs by a scalar is to inline the code that uses it, which is not possible in general. But we can talk about it later in more details.

dabrahams · 2025-08-27T20:32:42Z

dabrahams
Aug 27, 2025
Maintainer Author

It's worth noting that through discussion in our weekly meetings, this scheme is now the plan of record.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Hylo Group

Projection ABI #1718

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

The Hylo Group

Projection ABI #1718

Uh oh!

dabrahams Aug 6, 2025 Maintainer

Replies: 2 comments · 2 replies

Uh oh!

kyouko-taiga Aug 7, 2025 Maintainer

Uh oh!

dabrahams Aug 7, 2025 Maintainer Author

Uh oh!

Uh oh!

kyouko-taiga Aug 7, 2025 Maintainer

Uh oh!

dabrahams Aug 27, 2025 Maintainer Author

dabrahams
Aug 6, 2025
Maintainer

Replies: 2 comments 2 replies

kyouko-taiga
Aug 7, 2025
Maintainer

dabrahams Aug 7, 2025
Maintainer Author

kyouko-taiga Aug 7, 2025
Maintainer

dabrahams
Aug 27, 2025
Maintainer Author