-
Notifications
You must be signed in to change notification settings - Fork 20
Predictor Arrays in one Dataset per scenario #677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Predictor Arrays in one Dataset per scenario #677
Conversation
for more information, see https://pre-commit.ci
Good point - the predictor datatree had to look different than the target which is maybe a bit confusing... How difficult/ annoying is it to add a new variable to the nodes? This will have an influence on MESMER-X. The third option is to have separate datatrees
|
Not so bad if it shouldn't be generalizable/automated (as in the tests) -- i.e. if you know which ones you want and can do it by names. The much more annoying part is getting an isomorphic datatree with a subset of variables from one with all variables. Which we need if we eg load a datatree that has both
Yes.
Yes. Also not a big fan of that |
Yes this is a general problem - e.g. when we wrote a function for a
Ok |
for more information, see https://pre-commit.ci
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #677 +/- ##
==========================================
- Coverage 92.28% 92.21% -0.07%
==========================================
Files 48 48
Lines 2785 2762 -23
==========================================
- Hits 2570 2547 -23
Misses 215 215
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
exclude.add("tas2") | ||
if use_hfds: | ||
exclude.add("hfds") | ||
local_variability_from_global_var = lr.predict(predictors, exclude=exclude) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes me wonder if we should allow lr.predict(predictors, only="tas_resids")
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that would be pretty, but not urgent.
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
is #666 done? |
#666 is done (I think) but I need to merge the other 2 PRs first. I'll give this another pass, and we can maybe merge all of them tomorrow. |
🎉 🎉 then we'll deal with mesmer-x and the tutorials next |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@veni-vidi-vici-dormivi can you merge main, re-create the test files and push - then we can merge |
c1b85b5
to
6e10d08
Compare
Hmm strange that the |
Yeah no, I just pushed new files where I left that file the same and it looks good. The thing is that this one fails for me locally. That already happened before this PR, but I don't know when. 🙈 |
Would be good to get it to pass for you and on GHA, but yes optimizations are annoying 😞 |
…-group#677) * started draft to have the different predictors in one dataset per scenario * use datatree.assign * add datatree merge function * merge datatrees for lr predictors in calibration test * adjust broadcast_and_stack * finish new datastructure implementation (fails) * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove cmip gen * change merge method to use lambda * select tas instead of dropping other vars from datatree * edit docstring of broadcast_and_stack_scenarios * adjust test of broadcast_and_stack_scenarios * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge docstring * test merge function * extend _skip_empty_nodes to non DataTree args in first position * wrap LinearRegression.predict() for datatree * adjust volc * fix unit tests * fix in _skip_empty_nodes for only coords datasets * datatree wrapper for draw autoregression funcs * remove DataTree in LinearRegression.fit() * updated draw_realisations new (in progress) * new data reading in draw realisations * move tas**2 after volcanic forcing * new emulation sets * adjustments for other tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update tests/integration/test_calibrate_mesmer_newcodepath.py * Update tests/integration/test_draw_realisations_mesmer_x.py * Update tests/integration/test_draw_realisations_mesmer_x.py * fix merge error * Update mesmer/core/datatree.py Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> * Update mesmer/core/datatree.py Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> * merge variable datasets in load * move xarray type imports to the top * implement suggestions in draw_realisations_mesmer_newcodepath * precommit * CHANGELOG * adjust numbers in tests related to aod * update testfiles * new AR params test file for calibrate_mesmer_x with 2nd fit --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Mathias Hauser <mathias.hauser@env.ethz.ch> Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Okay, since this is maybe the last chance to change something big about the DataTree structure, here goes my idea:
Instead of collecting the different predictor variables (
tas
,hfds
) in subtrees and those holding scenarios for each variables like this:I suggest putting the variables together into datasets and these into a Tree:
This is more elegant since the different predictors actually live on the same coordinate system they do fit into a dataset together - so we would be using the data structures in their intended way. We same on memory because we don't have all the coordinated duplicated (which is the rationale behind the intended use).
However, I feel that the Code does get more complicated like this. So I need a second opinion on this.
CHANGELOG.rst