Skip to content

Conversation

@bnb32
Copy link
Collaborator

@bnb32 bnb32 commented Aug 2, 2025

Added parameter for scaling obs data in obs rasterizer.
Small refactor of arguments in ExoDataHandler.
Enabled aggregation of 3d data in ExoRasterizer

bnb32 added 14 commits August 1, 2025 07:34
…ng the systematic negative bias of madis data.
…er for this to be explicit so that users know where data is stored.
… - replaced with exo_rasterizer_kwargs parameter.
…rectly and remove redundant target and shape parameters from tests.
…utfile' to 'out_fille' in multiple files for consistency.
@bnb32 bnb32 requested a review from grantbuster August 5, 2025 23:32
Copy link
Member

@grantbuster grantbuster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment but generally i'm a bit nervous about this. Have we figured out why the MADIS obs are negatively biased? How are you determining a scale factor?

ds.to_netcdf(outfile, format='NETCDF4', engine='h5netcdf')
return outfile
ds.to_netcdf(out_fille, format='NETCDF4', engine='h5netcdf')
return out_fille
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the extra L?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, bad string replace


return df[self.feature].values.reshape(self.hr_shape[:-1])

def _get_data_3d(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommend adding docstring describing output shapes/dimensions here.

Copy link
Collaborator Author

@bnb32 bnb32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment but generally i'm a bit nervous about this. Have we figured out why the MADIS obs are negatively biased? How are you determining a scale factor?

Don't have a complete explanation for the bias of MADIS vs WTK but that's how I'm determining the scale factor (average of wtk(at 10m) / madis over multiple sites). I haven't been using this right now but I think something like this makes sense so that the relationship between near surface and hub height learned by the model is preserved during inference.

… files. doc string in _get_data methods in exo.py
bnb32

This comment was marked as duplicate.

@grantbuster
Copy link
Member

Minor comment but generally i'm a bit nervous about this. Have we figured out why the MADIS obs are negatively biased? How are you determining a scale factor?

Don't have a complete explanation for the bias of MADIS vs WTK but that's how I'm determining the scale factor (average of wtk(at 10m) / madis over multiple sites). I haven't been using this right now but I think something like this makes sense so that the relationship between near surface and hub height learned by the model is preserved during inference.

Well the model will preserve those relationships whether we like it or not, just a matter of bias at surface and consequently at hub height, right? The model should learn to respect and assimilate the obs we give it, but if we don't like the obs then the question is what do we do? Does the WTK have tons of bias at 10m? This fudge factor makes sense but is it scientifically defensible? Would the fudge factor really be constant for all locations and times?

@bnb32
Copy link
Collaborator Author

bnb32 commented Aug 6, 2025

Minor comment but generally i'm a bit nervous about this. Have we figured out why the MADIS obs are negatively biased? How are you determining a scale factor?

Don't have a complete explanation for the bias of MADIS vs WTK but that's how I'm determining the scale factor (average of wtk(at 10m) / madis over multiple sites). I haven't been using this right now but I think something like this makes sense so that the relationship between near surface and hub height learned by the model is preserved during inference.

Well the model will preserve those relationships whether we like it or not, just a matter of bias at surface and consequently at hub height, right? The model should learn to respect and assimilate the obs we give it, but if we don't like the obs then the question is what do we do? Does the WTK have tons of bias at 10m? This fudge factor makes sense but is it scientifically defensible? Would the fudge factor really be constant for all locations and times?

The value of near surface ws vs hub height ws is part of that relationship. Either way though, this is just following your suggestion to scale madis to wtk that came up in the discussion on what is the true measurement height for madis. Whether wtk is biased at 10m is unknown, since the comparison would be to madis... Of course the global scale factor is the simplest iteration and it would be better to apply some location / time dependent correction, but that isn't feasible right now.

@grantbuster
Copy link
Member

Minor comment but generally i'm a bit nervous about this. Have we figured out why the MADIS obs are negatively biased? How are you determining a scale factor?

Don't have a complete explanation for the bias of MADIS vs WTK but that's how I'm determining the scale factor (average of wtk(at 10m) / madis over multiple sites). I haven't been using this right now but I think something like this makes sense so that the relationship between near surface and hub height learned by the model is preserved during inference.

Well the model will preserve those relationships whether we like it or not, just a matter of bias at surface and consequently at hub height, right? The model should learn to respect and assimilate the obs we give it, but if we don't like the obs then the question is what do we do? Does the WTK have tons of bias at 10m? This fudge factor makes sense but is it scientifically defensible? Would the fudge factor really be constant for all locations and times?

The value of near surface ws vs hub height ws is part of that relationship. Either way though, this is just following your suggestion to scale madis to wtk that came up in the discussion on what is the true measurement height for madis. Whether wtk is biased at 10m is unknown, since the comparison would be to madis... Of course the global scale factor is the simplest iteration and it would be better to apply some location / time dependent correction, but that isn't feasible right now.

I hear ya i'm just realizing how hacky this is. Do you know what scale factor you'd use and what kind of delta h that would imply using the power law? If you use this in published data or a manuscript just be really careful, will be hard to defend i think.

@bnb32
Copy link
Collaborator Author

bnb32 commented Aug 6, 2025

Minor comment but generally i'm a bit nervous about this. Have we figured out why the MADIS obs are negatively biased? How are you determining a scale factor?

Don't have a complete explanation for the bias of MADIS vs WTK but that's how I'm determining the scale factor (average of wtk(at 10m) / madis over multiple sites). I haven't been using this right now but I think something like this makes sense so that the relationship between near surface and hub height learned by the model is preserved during inference.

Well the model will preserve those relationships whether we like it or not, just a matter of bias at surface and consequently at hub height, right? The model should learn to respect and assimilate the obs we give it, but if we don't like the obs then the question is what do we do? Does the WTK have tons of bias at 10m? This fudge factor makes sense but is it scientifically defensible? Would the fudge factor really be constant for all locations and times?

The value of near surface ws vs hub height ws is part of that relationship. Either way though, this is just following your suggestion to scale madis to wtk that came up in the discussion on what is the true measurement height for madis. Whether wtk is biased at 10m is unknown, since the comparison would be to madis... Of course the global scale factor is the simplest iteration and it would be better to apply some location / time dependent correction, but that isn't feasible right now.

I hear ya i'm just realizing how hacky this is. Do you know what scale factor you'd use and what kind of delta h that would imply using the power law? If you use this in published data or a manuscript just be really careful, will be hard to defend i think.

Yeah it's a fair point. Scale factor would be around 1.1 which puts the delta h at ~5 meters.

@bnb32 bnb32 merged commit 684bdb8 into main Aug 7, 2025
12 checks passed
@bnb32 bnb32 deleted the bnb/obs_scaling branch August 7, 2025 17:44
github-actions bot pushed a commit that referenced this pull request Aug 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants