Approaches for “Lazy” sequences #1668

dabrahams · 2025-03-21T17:58:26Z

dabrahams
Mar 21, 2025
Maintainer

(See also #1081)

Since @RishabhRD is going to be exploring lazy sequences in rs-stl soon I thought it would be worth opening a discussion about the domain

I'd like to begin by outlining the roughly complete space of possible sequence concepts. Then we can decide which ones may or may not be important and how they should be represented (naming is also up for grabs as far as I'm concerned).

Dimensions of the space, assuming there is some notional sequence object `s`

These are not necessarily supposed to be concepts or traits; they just represent properties of a sequence that might be important to some generic algorithm.

[l/r]value-elements: Are elements accessed as Lvalues or Rvalues?
reorderable: Can elements be reordered?
persistent-elements: Can multiple element lvalues be examined at once?
[im]mutable-elements: Are elements mutable?
multipass: Can we make multiple passes without mutating s?
peekable: Is there a way to get the current element without mutating s?
positional: Is there a representation of position?
bidirectional: Is there bidirectional traversal?
random-access: Is there random-access traversal?
range-replaceable: Can we make arbitrary changes to the structure of s?
traversal-throws: Can traversal fail?
access-throws: Can element access fail?
segmentation: see this paper.
low-level:
- contiguous-memory
- trivially-movable-elements
- trivially-copyable-elements
- etc.

Did I miss anything?

DmT021 · 2025-03-21T18:44:03Z

DmT021
Mar 21, 2025

finiteness of a sequence maybe? It could be useful to diagnose if code after a loop over a sequence is reachable
An indication if an error(recoverable/non-recoverable) may occur during iteration
contiguous memory layout of elements? for SIMD-friendly algorithms

2 replies

dabrahams Mar 26, 2025
Maintainer Author

I think finiteness is not a meaningful thing to capture in a concept. There is little practical difference between an infinite sequence and one that is merely huge (e.g. 0...Int128.max).
We don't represent non-recoverable errors in the type system, but we plan to support throws for recoverable errors. I'm not sure that's a difference that could make a meaningful difference to an algorithm, since an exception should just propagate through, but I'll add that to the list, just in case.
Contiguous memory: yes, good idea. This is in the same category as a whole bunch of relevant low-level properties like “trivially-movable elements.”

DmT021 Mar 27, 2025

I think finiteness is not a meaningful thing to capture in a concept. There is little practical difference between an infinite sequence and one that is merely huge (e.g. 0...Int128.max).

From a practical point of view, I agree. But there are still some interesting examples that could be made here. Let's say we have a sequence and a function that returns N first elements of the sequence - prefix(s, N). When N is a compile-time-known value and the sequence is infinite, the return type of the function could be T[N], but it has to be Array<T> otherwise. We could take a step further and say that the constraint is N <= "minimal guaranteed length of the sequence" and invent a notion to express that length. And then we could extend this to zip and other sequence agregattors. In general, the idea is that the return types can be more specific without losing compile-time safety.

Unsafe C++ illustration

class FibonacciIterator {
public:
  using value_type = unsigned long;

  constexpr FibonacciIterator() : a(0), b(1) {}

  constexpr value_type operator*() const {
    return a;
  }

  constexpr FibonacciIterator& operator++() {
    unsigned long temp = a;
    a = b;
    b = temp + b;
    return *this;
  }

private:
  unsigned long a, b;
};

class FibonacciSequence {
public:
  constexpr FibonacciIterator begin() const {
    return FibonacciIterator();
  }
};

template<typename Seq, std::size_t Size>
constexpr std::array<unsigned long, Size> constexprPrefix(const Seq& s) {
  std::array<unsigned long, Size> array = {};
  std::size_t i = 0;
  for (auto it = s.begin(); i < Size; ++it, ++i) {
    array[i] = *it;
  }
  return array;
}

auto fib = constexprPrefix<FibonacciSequence, 10>(FibonacciSequence());

We don't represent non-recoverable errors in the type system, but we plan to support throws for recoverable errors. I'm not sure that's a difference that could make a meaningful difference to an algorithm, since an exception should just propagate through, but I'll add that to the list, just in case.

If you're going to add throws you'll need a way to generalize over it. For example, the map function - If a sequence may throw an error of type E1 during iteration and the closure passed to map may throw E2 - the map's error type is E1 ∪ E2. Swift doesn't allow throwing errors in IteratorProtocol but allows doing so in AsyncIteratorProtocol. And since SE-0421, the error's type is part of the protocol. And the fact that typed throws were added after the initial introduction of the protocol made it kinda awkward.

RishabhRD · 2025-03-23T15:12:58Z

RishabhRD
Mar 23, 2025

Here are my initial thoughts:

The term "sequence" brings swift's sequence protocol to my mind which I see as a kind of iterator factory.

I think there are 2 kind of iterable entities:

Where "notion of existence" (might not be actual existence) of multiple elements at a time doesn't exist. In Hylo land, I assume we call it Iterator.
Where notion of existence of multiple elements at a time is present. We call it Collection.

I think these 2 entities are very independent and there is no need to couple it on a protocol level. Discussion at #1081 also goes in similar lines. Many times the term Iterator drives my mind to think these 2 entities are really connected maybe because I worked in Java in past and in java collections can yield java-iterators. #1081 suggests, generators might be a good name.

Any collections that can generate rvalues without copying can be handled by "PopFirstable" concept mentioned in #1081. We can provide default implementation of "Sequence" for all pop firstable collection. We might also consider better name than Sequence that somehow doesn't imply any connection with Collection.

If we are on the same page for this, then for laziness we only need to worry about 2nd entity where notion of existence of multiple elements at a time exists. Then we can almost quite say that "multi-pass" and "positions" exist for lazy collections too.

Regarding 2nd entity, we kind of know the properties of entity where it yields lvalue on access. rvalues on access is where laziness appears (like map).

I am also unsure about the category of "views" which might return rvalues whose lifetime is dependent on lifetime of collection like zip (for (&E1, &E2)) or split (for Slice<Collection>). Should these views also be considered as lazy collections? Are these YAGNI? I am also curious of your reasoning behind not considering these views for mutation.

2 replies

dabrahams Mar 26, 2025
Maintainer Author

The term "sequence" brings swift's sequence protocol to my mind which I see as a kind of iterator factory.

I intentionally avoided capitalization or code voice to indicate that I intended the normal English meaning of the word.

(might not be actual existence)

I don't understand what you have in mind here.

Where "notion of existence" of multiple elements at a time doesn't exist. In Hylo land, I assume we call it Iterator.

Where notion of existence of multiple elements at a time is present. We call it Collection.

Well…

I don't think the “notion of existence of multiple elements” is nearly as solid a distinction as “single-pass vs multi-pass.” Is there something valuable provided by the former that isn't captured by the latter?
Every Collection can be an Iterator factory.
Practically speaking, because Hylo has no first-class references, there are probably two different kinds of Iterator: one that produces rvalues and another that projects lvalues. I'm not sure what form these take.
The idea of a peekable sequence lies somewhere between our current understanding of Iterator and Collection.

I think these 2 entities are very independent and there is no need to couple it on a protocol level.

The role of Sequence in Swift is to present a common interface used by for x in y regardless of whether y is a single-pass or multi-pass thing. I'm not yet sure whether it has a place in Hylo. #1081 suggests that they are not really related because of the way that producing a non-destructive Iterator for a Collection implies copying elements. But I think that conclusion depends on the idea that an Iterator always accesses rvalues. I don't think we understand the space well enough to make that determination yet.

I think we ought to enumerate our goals and then see how to address them (which may include dropping some of them). It is my intuition that all of the knotty problems arise around single-pass algorithms, because they can either consume elements or not.

Goals for single passes over sequences

[Words like sequence and iterator here are not meant to imply any particular trait or API. A sequence is a series of values, and an iterator captures the state of a traversal over that series and provides access to the next value].

I'm adding some consequences as follow-on sentences to each bullet.

Any escapable sequence can be iterated for consumption, producing element rvalues. For a multi-pass sequence producing this iterator means consuming the sequence. Because not all multipass sequences can pop their first element in O(1), the iterator type would have to be an associated type.
Any sequence can be iterated for reading, producing element lvalues that are projected out of the iterator. It is likely that these two kinds of iteration imply two distinct iterator types for a sequence.
A higher-order single-pass algorithm that takes a function operating on elements can work with a function that consumes elements or one that just reads them. For example, the following should both be possible:
```
["abc", "23", "77"].map(fun (_ x: sink String) -> String { x.remove_last() })
["abc", "23", "77"].map(fun (_ x: String) -> Optional<Int> { Int(parsing: x) })
```
Every notional algorithm can be represented as a single function or method bundle, without overloading. (I doubt this goal is attainable, especially for higher-order algorithms.)

Did I miss anything important?

dabrahams Mar 26, 2025
Maintainer Author

I am also unsure about the category of "views" which might return rvalues whose lifetime is dependent on lifetime of collection like zip (for (&E1, &E2)) or split (for Slice<Collection>).

There are two ways you might want to zip, consuming or non-consuming. I assume since you are talking about a lifetime dependency, you mean the non-consuming kind. Until we get remote parts, I'm not sure we can represent the elements of the result cleanly, but maybe we can build RemotePair<T, U> using unsafe tools. Regardless, I think it would look something like this:

extension Collection {
  subscript zip<Other: Collection>(_ other: Other): ZipCollection<Self, Other>
}

Should these views also be considered as lazy collections?

I'm not sure we know what “lazy collection” means exactly, so I'm not sure what difference the answer makes.

Are these YAGNI?

I don't think so. I think they are essential tools in algorithm composition.

I am also curious of your reasoning behind not considering these views for mutation.

For views like zip it could make sense. For something like split, unless I'm missing something, mutating one of the elements by assignment (which is a slice) would imply restructuring the collection out of which the result was projected (because you could replace any slice with a slice of a different length). I don't know how to make that work, considering that it would require adjusting the positions of all of the elements of the projected result, and restructuring the underlying collection could invalidate all positions.

kyouko-taiga · 2025-03-26T21:38:21Z

kyouko-taiga
Mar 26, 2025
Maintainer

Here are some thoughts I have on laziness.

Like @RishabhRD, I think there are two kinds of iterable entities, which I will call collections and streams. The former support multi-pass iteration and the latter supports single-pass iteration. I am not attached to those terms (or any of the other straw mans I'll use next) but I think they at least work for the sake of the discussion. I guess the basics are quite uncontroversial. For collections, we get start and end positions and a subscript to project (i.e., access an lvalue) contents at a specific position. For streams, we get something a mutating method that returns rvalues until the stream is depleted (if ever).

For lazy collections, my working theory is that we can get away building on top of streams. For instance:

var xs = Array(0 ..< 10)
var ys = xs.stream().filter(fun (e) { e % 2 == 0 })
print(&ys.take(3))     // prints '[0, 2, 4]'
&xs = ys.collect()     // consumes the remainder of `ys`

To make that work, Array would conform to some Streamable trait requiring a stream method to transform the receiver into a stream. I would probably not provide a default conformance for this trait. Then a type like Array could come with its own ArrayStream that'd simply move elements from the front of its internal buffer, keeping a pointer to the next element until the stream is depleted. Note that this particular type would support a peek operation. While ArrayStream may sound like a slice, I think that should there should be a distinction. The way I see it, slices should be collections that do not support sinking access to their contents. An ArrayStream could be a collection, though.

It is easy to define a default implementation of collect, which transforms an arbitrary stream into a collection: just append each element taken from the stream into a buffer. Eventually that will just result in an array. Of course this behavior could be "specialized" for specific data types, like ArrayStream.

I think we can get quite far with this approach. If I look at my own code, most applications of laziness are on collections that I'll throw away right after the lazy collection is used. However, as mentioned elsewhere, this approach applies poorly to cases where the original collection should not be consumed. One easy fix is to just copy the original collection before creating a stream (i.e., var ys = xs.copy().stream), but this copy may be quite expensive. One perhaps better solution may be to define a non-destructive stream in a fashion similar to LazyMapCollection. The stream would own the collection but keep it unchanged until it is "closed". Elements of the stream would be the result of applying some lambda:

var xs = Array(0 ..< 10)
var ys = xs.lazy_map(Int.copy).filter(fun (e) { e % 2 == 0 })
print(&ys.take(3))     // prints '[0, 2, 4]'
&xs = ys.release()     // assigns `xs` to its original value

This trick only works if we can sink the original collection. That is where remote parts become tempting but I'd advocate for a higher-order function instead:

let xs = Array(0 ..< 10)
xs.with_lazy_map(Int.copy, fun (_ zs : sink) {
  var ys = zs.stream.filter(fun (e) { e % 2 == 0 })
  print(&ys.take(3))   // prints '[0, 2, 4]'
})

6 replies

kyouko-taiga Mar 27, 2025
Maintainer

I think that is simplest, but I'm not sure it's necessary. We could allow a slice to be sink-projected from the Array, in which case the slice would own the storage and the elements. There are tradeoffs here.

True. But still the interface of the slice would be different I think. That would be notionally an ArrayStream+Collection.

should it be?

Maybe not. I would start with a clear separation of streams and collections. It' only that when I think about "peekability" of ArrayStream I catch myself wondering about peek(n) and then the picture looks like a collection.

But, perhaps this sort of details can be put aside while we're still figuring out the broad strokes.

I don't know if we want that, because it's only useful if xs is an array. &xs.init(ys) is potentially a more flexible way to do it.

Sure. The advantage with .collect() is that we can method-chain. Alternatively we could write .reduce(into: Array(), Array.append(_:)), .reduce(into: Deque(), Deque.prepend(_:)), etc. We can obviously think of some trait to support more efficient reductions (e.g., looking at an underestimated count and so on).

IMO the assumption that a lazy single pass needs to consume anything (the collection or its elements) is problematic.

The thing is that it seems non-consuming iteration is tightly coupled with the collection trait. If we do not consume the container, then it means we have a way to "rewind" iteration, which implies that the state of iteration can be decoupled from the container. If we do not consume the elements of the container, then we must project them given some information computed by the aforementioned state. Putting these things together, you get very close to the collection concept.

For example:

type Fibonacci {
  type State {
    public var a, b: Int
    public init() { &a = 1; &b = 1 }
  }
  var s: State

  public subscript peek: Int { yield s.b }
  public fun advance() inout {
    var n = s.a + s.b; &a = b; &b = n
  }
}

I guess this is a representative of an interface supporting non-consuming iteration. The advance method cannot return elements because if it did we would need to consume them. So we need some kind of combination of peek and advance. Clearly I designed this type with a bias but if you ask me it looks almost exactly like a collection. State is a position, peek is the collection subscript, and advance is position(after:). The only "issue" is that there is no end position but as we have established, finiteness is a dubious requirement. So if I had to provide an end position here I would probably offer State(a: Int.max, b: Int.max).

More generally, I am almost convinced that non-consuming iteration over some sequence implies that the sequence is in fact a collection or that there is a way to construct a collection on top of the sequence.

I don't know exactly how we want to write it, something like this should be possible without consuming anything:

The interface I proposed satisfies this requirement:

let xs: Array<Array<Int>> = stuff()
xs.with_lazy_map(fun (e) { e.count() }, fun (ns) { ns.reduce(0, Int.infix+) })

Collection.with_lazy_map(_:_:) is not a sink method. Presumably its implementation will use the unsafe API to form some LazyView that keeps a pointer to the base collection and passes it inout to the second lambda. These shenanigans are necessary because we cannot project mutable things out of immutable things.

Of course this particular example is better expressed as:

xs.reduce(0, fun (sum, ys) { sum + ys.count() })

It's realistic that you might want to read from a collection xs and something lazily projected from xs at the same time, so their lifetimes would have to overlap.

The interface I proposed satisfies this requirement too because Collection.with_lazy_map(_:_:) is not a sink method.

dabrahams Mar 27, 2025
Maintainer Author

The interface I proposed satisfies this requirement

You're right, but I think I failed to make the right example.

let xs: Array<Array<BigNonCopyableInt>> = stuff()
xs.with_lazy_map(fun (e) { e[0] }, fun (ns) { ns.reduce(0, BigInt.infix+) })

This doesn't work because there's no way to project the elements out of the first function.

The above is already hard to read and the fix gets even more complicated:

let xs: Array<Array<BigInt>> = stuff()
xs.with_lazy_map(fun (e, f) { f(e[0]) }, fun (ns) { ns.reduce(0, BigInt.infix+) })

Here with_lazy_map passes its 2nd argument to its first argument, and you have to call it. That's not great.

I have more thoughts (some of which we discussed in today's meeting) but I wanted to get this part posted.

dabrahams Mar 27, 2025
Maintainer Author

The argument that you should just write

xs.reduce(0, fun (sum, ys) { sum + ys[0] })

Doesn't really work. In the vision for generic programming, every time you can have an algorithm with a simple identifiable name, you vend it. That means you don't use reduce directly to sum things; there's a sum method. You really want to use that! So you should write

xs SOMETHING.sum()

If sum is going to process those projected values, SOMETHING needs to be a subscript, which is why you see a subscript at the end of this post. Unfortunately, I think a higher-order projecting map is a bit messy because we don't have first-class subscript values:

trait LetProjection {
  type In
  type Out
  subscript(x: In): Out
}

extension Collection {
  subscript lazy_map<P: LetProjection where P.In == Element>(_ p: P): LazyProjectingMap<Self, P> { 
     yield .init(self, p)
  }
}

// User writes this
struct ProjectFirst<C: Collection where C.Element: Collection>: LetProjection { 
  type In = C.Element
  type Out = C.Element.Element
  subscript(_ x: In): Out { yield x[x.start_position()] }
  public memberwise init
}

xs.lazy_map[ProjectFirst<[BigNonCopyableInt]>].sum()

I think LazyProjectingMap works for mutable projection also but you'd need InoutProjection and a subscript in MutableCollection to access it.

[Aside: Interesting; sum could be a method bundle:

extension Collection where Element: Arithmetic {
  fun sum() -> Element {
    let { reduce(0, Element.+) }
    sink { self.stream.sum() }
  }
}

]

dabrahams Mar 27, 2025
Maintainer Author

if you ask me it looks almost exactly like a collection.

It absolutely is capable of satisfying Collection, since it is a multi-pass sequence.

State is a position, peek is the collection subscript, and advance is position(after:). The only "issue" is that there is no end position but as we have established, finiteness is a dubious requirement.

Finiteness and having an end position are orthogonal issues. You can always create a position value that is only equal to itself.

kyouko-taiga Mar 28, 2025
Maintainer

Great remarks!

because we don't have first-class subscript values

One insight from my work on formalizing subscripts is that I think I know how to make subscript first-class. I haven't figured out how to make bundles first class but presumably we should be able to write xs.map[subscript (e) { e.first }].

That's half good news, half concerning news because it is yet another dimension for polymorphic algorithms.

RishabhRD · 2025-05-12T20:29:05Z

RishabhRD
May 12, 2025

0 replies

The Hylo Group

Approaches for “Lazy” sequences #1668

Uh oh!

Uh oh!

dabrahams Mar 21, 2025 Maintainer

Dimensions of the space, assuming there is some notional sequence object s

Replies: 4 comments · 10 replies

Uh oh!

DmT021 Mar 21, 2025

Uh oh!

dabrahams Mar 26, 2025 Maintainer Author

Uh oh!

DmT021 Mar 27, 2025

Uh oh!

Uh oh!

RishabhRD Mar 23, 2025

Uh oh!

Uh oh!

dabrahams Mar 26, 2025 Maintainer Author

Goals for single passes over sequences

Uh oh!

dabrahams Mar 26, 2025 Maintainer Author

Uh oh!

Uh oh!

kyouko-taiga Mar 26, 2025 Maintainer

Uh oh!

kyouko-taiga Mar 27, 2025 Maintainer

Uh oh!

dabrahams Mar 27, 2025 Maintainer Author

Uh oh!

Uh oh!

dabrahams Mar 27, 2025 Maintainer Author

Uh oh!

dabrahams Mar 27, 2025 Maintainer Author

Uh oh!

Uh oh!

kyouko-taiga Mar 28, 2025 Maintainer

Uh oh!

Uh oh!

RishabhRD May 12, 2025

Survey of Laziness based on C++ Views

C++ Views

single_view, empty_view

iota

repeat, cycle

filter

transform/map

take, drop, take_while, drop_while

split, chunk, slide, stride

zip, cartesian_product, adjacent

join (Monadic Join)

Axes of Laziness

Discussion on "element computation" LazyCollection

Conclusion

dabrahams
Mar 21, 2025
Maintainer

Dimensions of the space, assuming there is some notional sequence object `s`

Replies: 4 comments 10 replies

DmT021
Mar 21, 2025

dabrahams Mar 26, 2025
Maintainer Author

RishabhRD
Mar 23, 2025

dabrahams Mar 26, 2025
Maintainer Author

dabrahams Mar 26, 2025
Maintainer Author

kyouko-taiga
Mar 26, 2025
Maintainer

kyouko-taiga Mar 27, 2025
Maintainer

dabrahams Mar 27, 2025
Maintainer Author

dabrahams Mar 27, 2025
Maintainer Author

dabrahams Mar 27, 2025
Maintainer Author

kyouko-taiga Mar 28, 2025
Maintainer

RishabhRD
May 12, 2025