-
Notifications
You must be signed in to change notification settings - Fork 32
Bnb/cond obs fwp #262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bnb/cond obs fwp #262
Conversation
ffe6dd8 to
018276a
Compare
…et to work with three data members.
…orbed WithObs variant.
…ind model training added to examples README.
…`` class to enable passing ``xr.Dataset`` and like objects directly to ``BatchHandlers`` without needing to invoke ``.data``
…ueueing, since batches can be sampled directly if there are none in the queue).
…d dequeueing since this would cast to tensors.
…h ``max_workers > 1``.
… just sampling is, except for macos, but training is not.
… files and replacement.
… files and replacement.
… files and replacement.
…classes fix: update tensor initialization to use tf.cast for consistency refactor: simplify Loader initialization in ForwardPassStrategy refactor: improve docstring handling in Loader class test: convert observation mask to numpy array in test_fixed_wind_obs
| logger.info(msg) | ||
| return total_grad, loss_details | ||
|
|
||
| def _get_parallel_grad( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does any of this need to be decorated with @tf.function? Am i crazy or did you remove this? If so, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a new helper function that was in run_gradient_descent before. Couldn't have a tf.function decorator since it was calling np.array_split
sup3r/models/with_obs.py
Outdated
| class Sup3rGanWithObs(Sup3rGan): | ||
| """Sup3r GAN model which includes mid network observation fusion. This | ||
| model is useful for when production runs will be over a domain for which | ||
| observation data is available.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere, probably in this header, can you describe the nuances of training with (fake)observations vs. inference with real observations? I think that functions described with "obs for current batch" are for fake observations from training data but it's not really clear.
I also want to know how someone would pass in real observations at inference time. You don't subclass the generate() function here so i don't see anywhere that would describe how to provide real observations in a fwp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah thanks for this feedback. Hard to get this perspective when you're "in it" for a while. Obs are used in fwp the same way as other exogenous data sources, but the ObsRasterizer should be specified (this allows NaNs to make it through the rasterization) or obs feature names should include the suffix "_obs" to automatically use the ObsRasterizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new docs look good thanks!
| step and will be converted to a list | ||
| """ # noqa : D301 | ||
| if isinstance(steps, dict): | ||
| for feat, entry in steps.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole init statement is really convoluted and confusing and i'm pretty sure the entire purpose of this logic is to accomodate inputs that are not correctly formatted. Do we really need to be this flexible? I think decisions like this make things way too complex. My suggestion would be to just have a ton of assert statements in this init to make sure things are formatted correctly and then have static methods that could massage things into the correct format.
Also, nitpick, no vertical spacing for this entire block is insane. No way this is one logical group of statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is definitely fair. I got tired of not being able to pass exo data to generate just with a dict {'topography': nd.array(...)} - instead needing to do this nested dict described in the current doc. The nested dict was motivated by the need to work with multi step models which require a lot more complexity, but we've moved away from multi-step models a bit in favor of "multi-step" pipelines (e.g. spatial config, temporal config). Maybe this could allow us to simplify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a classic example of the challenge of having a single object do all the things? Trying to do: exo data for multiple features, multiple model steps, and multiple types of concat? Would having more atomic objects help with this? Like having a exo class for each of these options, or having a class template for a single exo layer that can be stacked by another class.
This might also be a good example of where flat is better than nested. A flat list of exo objects would be really easy to parse through and could be handled the same for a single GAN and multi-step GAN.
What if the ExoDataSingle class was just a data class with ExoDataSingle(data, fname, step_ind, ctype, model_ind=0) and then ExoData was just a list of these and they would be super easy to parse through for single in ExoData if single.attrs == desired_attrs then use this object; raise if desired_attrs not found
Sounds like maybe a good opportunity for refactor, but probably in a separate PR after we do a release with all of this PR's work. Right now just glad you simplified the init logic, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you prefer more doc info elsewhere or do you think these updates are sufficient? Also, lmk what you think on the ExoData response.
| logger.info(msg) | ||
| return total_grad, loss_details | ||
|
|
||
| def _get_parallel_grad( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a new helper function that was in run_gradient_descent before. Couldn't have a tf.function decorator since it was calling np.array_split
sup3r/models/with_obs.py
Outdated
| class Sup3rGanWithObs(Sup3rGan): | ||
| """Sup3r GAN model which includes mid network observation fusion. This | ||
| model is useful for when production runs will be over a domain for which | ||
| observation data is available.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah thanks for this feedback. Hard to get this perspective when you're "in it" for a while. Obs are used in fwp the same way as other exogenous data sources, but the ObsRasterizer should be specified (this allows NaNs to make it through the rasterization) or obs feature names should include the suffix "_obs" to automatically use the ObsRasterizer.
| step and will be converted to a list | ||
| """ # noqa : D301 | ||
| if isinstance(steps, dict): | ||
| for feat, entry in steps.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is definitely fair. I got tired of not being able to pass exo data to generate just with a dict {'topography': nd.array(...)} - instead needing to do this nested dict described in the current doc. The nested dict was motivated by the need to work with multi step models which require a lot more complexity, but we've moved away from multi-step models a bit in favor of "multi-step" pipelines (e.g. spatial config, temporal config). Maybe this could allow us to simplify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grantbuster Would you prefer more doc info elsewhere or do you think these updates are sufficient? Also, lmk what you think on the ExoData question.
…added tf.function back to _get_parallel_grad.
…e / training only designation
The commit history is a mess but the cumulative effect is almost entirely in sup3r/preprocessing/rasterizers/exo.py and sup3r/models/with_obs.py. Lots of tiny edits re: refactoring, adding doc strings, etc.