Replies: 2 comments 2 replies
-
@schoi839 notes that some models might want to write back "raw" data to the datastore - for example gap-filling models. In theory the raw data should be independent observations; in practice we will need to use simulation outputs. Should we allow for models during calculation to write to the tabular data store? |
Beta Was this translation helpful? Give feedback.
2 replies
-
In order to properly answer this question we need first to understand:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
After normalization to the glossary, the current assumption is that most data is tabular (CSVs, vector geospatial data, time-series). Our basic conception is that this tabular data has columns as parameters of interest and rows as observations. Measured or modelled data which includes correlated observations should have those correlated values in the same row.
Tabular data has a given context - the metadata which describe what the dataset is about. This would include when the observations were made (and the time interval they are valid for), what technology was measured, who produced the good or service, etc.
Probably we will use something like Parquet, which has a lot of energy behind it, is already column-oriented, has standardized ways to describe metadata and versioning, and support geospatial data.
But what would an API look like? @cmutel has a very strong preference towards building on simple building blocks (as we will change in the future in any case), and we aren't ready to commit to ourselves and our partners to enterprise solutions like Snowflake, BigQuery, RedShift, or DataBricks.
The use cases here are relative simple:
Beta Was this translation helpful? Give feedback.
All reactions