Temporal data manipulation operators? #8007
frankmcsherry
started this conversation in
Technical musings
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Materialize presents collections as rows that change with time. Under the covers, the representation is an append only list of these rows, with a timestamp and difference (when and what happens to the multiplicity of the row).
It seems not unreasonable to provide operators that go between these two representations. That is, some operator or SQL idiom that turns streams of
([row], time, diff)
updates into([row, time, diff], time, +1)
updates, and an operator that goes back again.In fact, the "back again operator" seems to already exist, in the form of temporal filters plus the
repeat_row
table function (the first advances update times to a function of the data, and the second multiplies multiplicities by a function of the data).The "there in the first place" operator is harder to express in SQL I think, as we don't present a great way to introspect on times (intentionally, as we don't want folks writing time-sensitive logic if we want to be able to maintain the queries efficiently). At the same time, the operator is very easy to implement in timely/differential dataflow.
Such an operator comes with some perils. One is that for as long as we are in the "changing collections" representation, concepts like "compaction" make a lot of sense. We can advance updates forward to some time, and be sure that if we start running a dataflow we will see the outputs that correspond to starting from scratch and then playing forward. We lose that property if we hoist times and differences up to data. So, perhaps this operator (should it even exist) should only exist in certain privileged spots in the dataflow.
Another peril is that the "append-only update log" representation is append-only. If you pass it to a
join
it will likely result in unbounded memory use, unless we are smart enough to handle it some other way. It is not unreasonable that queries expressed with thisto_log
operator might be meaningful, but we might need to be smarter about how we handle them.Beta Was this translation helpful? Give feedback.
All reactions