Replies: 1 comment 1 reply
-
Are there additional dimensions of partitioning beyond these?Yes so there are (at least) three layers now
Then once we are on a single node there are two layers, within
The continuum is between these two layers. At the dynamic extreme, you could make each "subgraph" just a single operator (like naive Timely dataflow). At the compiled extreme, you could compile the graph down into a single "subgraph" consisting of a bunch of sequential or nested for-loops which do everything for the entire graph (though this is non-trivial to actually do). We landed somewhere in the middle, where the graph is divided into the largest possible "in-out trees," and those become the subgraphs. Strata have a bit of cross-cutting concerns. Because each strata needs to run before the other, that means we need to be able to schedule them as such, and therefore different strata must be in different subgraphs. So you could think of strata as another layer in-between. That being said, with flo semantics we're moving away from using strata and it may be removed as a feature/concept soon. The replacement are Within a single stratum, it seems possible to have multiple independent sub-flows in the same Process. Given that a Process is single-threaded, can these independent sub-flows execute in parallel?No, all "processes" are single-threaded, which is an intentional design choice to support a shared-nothing architecture. Multiple hydro "processes" can run on the same machine (or even in the same OS process, at least we want to be able to support that in the future), but must always communicate via channels. That being said, we may loosen that requirement if we have good reason to. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Partitioning is the first compilation step performed on a data flow graph by the Hydroflow compiler. Its primary purpose appears to be identifying how much of the data flow can be executed efficiently through compilation techniques (e.g., function calls, iterators, inlining, monomorphization) rather than explicit scheduling. One of the architecture documents in the repository describes this concept as a continuum between “more scheduled” and “more compiled” data flow execution. At one extreme, each partition contains a single operator, resulting in higher overhead but greater flexibility. At the other extreme, the entire flow is compiled into a single sequence of Rust iterators, offering lower overhead but reduced flexibility.
The first dimension of partitioning is likely the Location, as it defines structural boundaries within the flow.
The second dimension appears to convert the data flow into a set of trees, characterized by unique paths between operators. Interestingly, these trees can take several forms: in-trees, out-trees, in-out trees (trees with a common root), or even poly-trees.
Another dimension is the concept of strata, which further divides the flow into strictly ordered phases: all sub-flows within stratum 0 must complete before those in stratum 1 begin execution.
Questions:
Beta Was this translation helpful? Give feedback.
All reactions