-
-
Notifications
You must be signed in to change notification settings - Fork 19
Description
At some point we'd like huey to support "models" consisting of multiple datasources / files / tables.
In a minimal setup, a model would be a kimball star schema:
- a collection of datasets that are deemed "dimensions"
- dimensions have attributes and these are primarily thought to be placed on the axes
- dimensions have a key. (This is traditionally a single integer column, but that is by convention, there is no reason to shun non-integer key columns or even composite keys).
- a central fact dataset.
- fact dataset has measures. These are aggregate expressions that can be placed on the cells
- fact dataset has dimension keys, which are used to connect fact data to the respective dimension
- fact dataset may have "normal" attributes (degenerate dimension)
- physically, the datasets can be a file, table, view - any tabular construct will do.
For analysis, the model would be used as a "Cube".
Conceptually, the cube is like a database view, that combines the fact table and all its dimensions in a so-called "star join".
From the user perspective, it looks a lot like a table, but the metadata in the model controls and constrains the items so that measures would only be usable as cells aggregates, and dimension attributes only as axis items.
(In a strict concept, items from one dimension would only be usable on just one axis at the same time)
The real interesting and hard part would be at the query engine.
Rather than writing queries against the conceptual view, we would only include those dimensions that are included in the query.
This would become even more useful if the dimensions could be "snowflakes" ie. multi-dataset, normalized, typically hierarchically organized dimensions consisting.
Things to consider:
- how to represent models?
- we could invent a datastructure / metadata tables to capture the model
- we could in theory also represent models as SQL queries or SQL view definitions - we would be able to derive the conceptual model by parsing the SQL
- how to build/design models?
- regardless of how we represent the models and model elements, one would need UIs to at least view the model, but rather even design them.