Predictor Arrays in one Dataset per scenario #677

veni-vidi-vici-dormivi · 2025-05-06T13:37:39Z

Okay, since this is maybe the last chance to change something big about the DataTree structure, here goes my idea:

Instead of collecting the different predictor variables (tas, hfds) in subtrees and those holding scenarios for each variables like this:

├─ tas
│  ├─ hist
│  ├─ scen1
│  └─ ...
├─ hfds
│  ├─ hist
│  ├─ scen1
│  └─ ...
└─ ...

I suggest putting the variables together into datasets and these into a Tree:

├─ hist
|        datavars: tass, hfds....
├─ scen1
|        datavars: tass, hfds....
└─ ...

This is more elegant since the different predictors actually live on the same coordinate system they do fit into a dataset together - so we would be using the data structures in their intended way. We same on memory because we don't have all the coordinated duplicated (which is the rationale behind the intended use).

However, I feel that the Code does get more complicated like this. So I need a second opinion on this.

Tests added
Fully documented, including CHANGELOG.rst

…nario

for more information, see https://pre-commit.ci

mathause · 2025-05-08T09:03:23Z

Good point - the predictor datatree had to look different than the target which is maybe a bit confusing... How difficult/ annoying is it to add a new variable to the nodes?

This will have an influence on MESMER-X.

The third option is to have separate datatrees

tas:
├─ hist
├─ scen1
└─ ...

hfds:
├─ hist
├─ scen1
└─ ...

veni-vidi-vici-dormivi · 2025-05-08T09:30:49Z

How difficult/ annoying is it to add a new variable to the nodes?

Not so bad if it shouldn't be generalizable/automated (as in the tests) -- i.e. if you know which ones you want and can do it by names.

The much more annoying part is getting an isomorphic datatree with a subset of variables from one with all variables. Which we need if we eg load a datatree that has both tas and hfds and somewhere in the code only want one with tas.

This will have an influence on MESMER-X.

Yes.

The third option is to have separate datatrees

Yes. Also not a big fan of that

mathause · 2025-05-08T11:42:10Z

How difficult/ annoying is it to add a new variable to the nodes?

Not so bad if it shouldn't be generalizable/automated (as in the tests) -- i.e. if you know which ones you want and can do it by names.

The much more annoying part is getting an isomorphic datatree with a subset of variables from one with all variables. Which we need if we eg load a datatree that has both tas and hfds and somewhere in the code only want one with tas.

Yes this is a general problem - e.g. when we wrote a function for a DataArray we cannot easily generalize to Dataset or DataTree. I am not a fan for passing the variable name, but it may be unavoidable. (Related - there are some functions that check you pass a Dataset with only one variable - we'd need to adjust those...)

This will have an influence on MESMER-X.

Yes.

The third option is to have separate datatrees

Yes. Also not a big fan of that

Ok

for more information, see https://pre-commit.ci

codecov · 2025-05-28T14:35:30Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.21%. Comparing base (aa049b9) to head (d16411a).
Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #677      +/-   ##
==========================================
- Coverage   92.28%   92.21%   -0.07%     
==========================================
  Files          48       48              
  Lines        2785     2762      -23     
==========================================
- Hits         2570     2547      -23     
  Misses        215      215

Flag	Coverage Δ
unittests	`92.21% <100.00%> (-0.07%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mathause · 2025-06-07T15:31:00Z

tests/integration/test_draw_realisations_mesmer_newcodepath.py

+        exclude.add("tas2")
+    if use_hfds:
+        exclude.add("hfds")
+    local_variability_from_global_var = lr.predict(predictors, exclude=exclude)


Makes me wonder if we should allow lr.predict(predictors, only="tas_resids").

yes that would be pretty, but not urgent.

tests/integration/test_draw_realisations_mesmer_newcodepath.py

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

veni-vidi-vici-dormivi · 2025-06-11T16:00:36Z

is #666 done?

mathause · 2025-06-11T16:16:39Z

#666 is done (I think) but I need to merge the other 2 PRs first. I'll give this another pass, and we can maybe merge all of them tomorrow.

veni-vidi-vici-dormivi · 2025-06-11T16:18:03Z

and we can maybe merge all of them tomorrow.

🎉 🎉

then we'll deal with mesmer-x and the tutorials next

mathause

Thanks!

mathause · 2025-06-12T12:33:08Z

@veni-vidi-vici-dormivi can you merge main, re-create the test files and push - then we can merge

mathause · 2025-06-12T14:05:14Z

Hmm strange that the power_transformer numbers failed. I think that should not have changed 🤔

veni-vidi-vici-dormivi · 2025-06-12T14:08:49Z

Hmm strange that the power_transformer numbers failed. I think that should not have changed 🤔

Yeah no, I just pushed new files where I left that file the same and it looks good. The thing is that this one fails for me locally. That already happened before this PR, but I don't know when. 🙈

mathause · 2025-06-12T14:24:42Z

Hmm strange that the power_transformer numbers failed. I think that should not have changed 🤔

Yeah no, I just pushed new files where I left that file the same and it looks good. The thing is that this one fails for me locally. That already happened before this PR, but I don't know when. 🙈

Would be good to get it to pass for you and on GHA, but yes optimizations are annoying 😞

…-group#677) * started draft to have the different predictors in one dataset per scenario * use datatree.assign * add datatree merge function * merge datatrees for lr predictors in calibration test * adjust broadcast_and_stack * finish new datastructure implementation (fails) * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove cmip gen * change merge method to use lambda * select tas instead of dropping other vars from datatree * edit docstring of broadcast_and_stack_scenarios * adjust test of broadcast_and_stack_scenarios * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge docstring * test merge function * extend _skip_empty_nodes to non DataTree args in first position * wrap LinearRegression.predict() for datatree * adjust volc * fix unit tests * fix in _skip_empty_nodes for only coords datasets * datatree wrapper for draw autoregression funcs * remove DataTree in LinearRegression.fit() * updated draw_realisations new (in progress) * new data reading in draw realisations * move tas**2 after volcanic forcing * new emulation sets * adjustments for other tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update tests/integration/test_calibrate_mesmer_newcodepath.py * Update tests/integration/test_draw_realisations_mesmer_x.py * Update tests/integration/test_draw_realisations_mesmer_x.py * fix merge error * Update mesmer/core/datatree.py Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> * Update mesmer/core/datatree.py Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> * merge variable datasets in load * move xarray type imports to the top * implement suggestions in draw_realisations_mesmer_newcodepath * precommit * CHANGELOG * adjust numbers in tests related to aod * update testfiles * new AR params test file for calibrate_mesmer_x with 2nd fit --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Mathias Hauser <mathias.hauser@env.ethz.ch> Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

veni-vidi-vici-dormivi added 8 commits May 5, 2025 21:24

started draft to have the different predictors in one dataset per sce…

3f80627

…nario

use datatree.assign

3d47779

add datatree merge function

9296517

merge datatrees for lr predictors in calibration test

0b089d1

Merge branch 'main' into multiple_preds

500b536

adjust broadcast_and_stack

13a6da0

finish new datastructure implementation (fails)

694468e

fix

954c9e8

veni-vidi-vici-dormivi marked this pull request as draft May 6, 2025 13:38

[pre-commit.ci] auto fixes from pre-commit.com hooks

c62b1b9

for more information, see https://pre-commit.ci

veni-vidi-vici-dormivi requested a review from mathause May 6, 2025 13:38

remove cmip gen

587faa0

veni-vidi-vici-dormivi and others added 5 commits May 28, 2025 15:42

change merge method to use lambda

302a344

select tas instead of dropping other vars from datatree

e3bfd0b

edit docstring of broadcast_and_stack_scenarios

c5b17fe

adjust test of broadcast_and_stack_scenarios

265a611

[pre-commit.ci] auto fixes from pre-commit.com hooks

1a55e29

for more information, see https://pre-commit.ci

veni-vidi-vici-dormivi added 9 commits May 28, 2025 17:18

merge docstring

e17c4f3

test merge function

fc49c8e

extend _skip_empty_nodes to non DataTree args in first position

d47bfec

wrap LinearRegression.predict() for datatree

64edfc7

adjust volc

e87b9a9

fix unit tests

c2c47d9

fix in _skip_empty_nodes for only coords datasets

afae8e4

datatree wrapper for draw autoregression funcs

0851c7e

remove DataTree in LinearRegression.fit()

0afb2a9

mathause mentioned this pull request Jun 7, 2025

datatree.merge function #701

Merged

3 tasks

Merge branch 'main' into multiple_preds

c18d85f

mathause reviewed Jun 7, 2025

View reviewed changes

tests/integration/test_draw_realisations_mesmer_newcodepath.py Outdated Show resolved Hide resolved

mathause mentioned this pull request Jun 7, 2025

LinearRegression.predict allow to select predictors #702

Closed

veni-vidi-vici-dormivi and others added 6 commits June 11, 2025 16:44

Update mesmer/core/datatree.py

c81c494

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

Update mesmer/core/datatree.py

2479b21

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

merge variable datasets in load

86c81de

move xarray type imports to the top

b9a2af0

implement suggestions in draw_realisations_mesmer_newcodepath

b1bc6ad

precommit

703dd63

mathause approved these changes Jun 12, 2025

View reviewed changes

veni-vidi-vici-dormivi added 4 commits June 12, 2025 14:41

Merge branch 'main' into multiple_preds

0a282f1

CHANGELOG

a0eacea

adjust numbers in tests related to aod

315b934

update testfiles

6e10d08

veni-vidi-vici-dormivi force-pushed the multiple_preds branch from c1b85b5 to 6e10d08 Compare June 12, 2025 14:03

veni-vidi-vici-dormivi closed this Jun 12, 2025

veni-vidi-vici-dormivi reopened this Jun 12, 2025

new AR params test file for calibrate_mesmer_x with 2nd fit

d16411a

veni-vidi-vici-dormivi merged commit af526b1 into MESMER-group:main Jun 12, 2025
11 checks passed

veni-vidi-vici-dormivi deleted the multiple_preds branch June 12, 2025 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Predictor Arrays in one Dataset per scenario #677

Predictor Arrays in one Dataset per scenario #677

Uh oh!

veni-vidi-vici-dormivi commented May 6, 2025 •

edited

Loading

Uh oh!

mathause commented May 8, 2025 •

edited

Loading

Uh oh!

veni-vidi-vici-dormivi commented May 8, 2025

Uh oh!

mathause commented May 8, 2025

Uh oh!

codecov bot commented May 28, 2025 •

edited

Loading

Uh oh!

mathause Jun 7, 2025 •

edited

Loading

Uh oh!

veni-vidi-vici-dormivi Jun 11, 2025

Uh oh!

Uh oh!

veni-vidi-vici-dormivi commented Jun 11, 2025

Uh oh!

mathause commented Jun 11, 2025

Uh oh!

veni-vidi-vici-dormivi commented Jun 11, 2025

Uh oh!

mathause left a comment

Uh oh!

mathause commented Jun 12, 2025

Uh oh!

mathause commented Jun 12, 2025

Uh oh!

veni-vidi-vici-dormivi commented Jun 12, 2025

Uh oh!

mathause commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

Predictor Arrays in one Dataset per scenario #677

Predictor Arrays in one Dataset per scenario #677

Uh oh!

Conversation

veni-vidi-vici-dormivi commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mathause commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

veni-vidi-vici-dormivi commented May 8, 2025

Uh oh!

mathause commented May 8, 2025

Uh oh!

codecov bot commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mathause Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

veni-vidi-vici-dormivi Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

veni-vidi-vici-dormivi commented Jun 11, 2025

Uh oh!

mathause commented Jun 11, 2025

Uh oh!

veni-vidi-vici-dormivi commented Jun 11, 2025

Uh oh!

mathause left a comment

Choose a reason for hiding this comment

Uh oh!

mathause commented Jun 12, 2025

Uh oh!

mathause commented Jun 12, 2025

Uh oh!

veni-vidi-vici-dormivi commented Jun 12, 2025

Uh oh!

mathause commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

veni-vidi-vici-dormivi commented May 6, 2025 •

edited

Loading

mathause commented May 8, 2025 •

edited

Loading

codecov bot commented May 28, 2025 •

edited

Loading

mathause Jun 7, 2025 •

edited

Loading