Skip to content

Conversation

@bnb32
Copy link
Collaborator

@bnb32 bnb32 commented Apr 12, 2025

The commit history is a mess but the cumulative effect is almost entirely in sup3r/preprocessing/rasterizers/exo.py and sup3r/models/with_obs.py. Lots of tiny edits re: refactoring, adding doc strings, etc.

@bnb32 bnb32 force-pushed the bnb/cond_obs_fwp branch from 2c0c05c to 2eeb503 Compare April 12, 2025 13:10
@bnb32 bnb32 force-pushed the bnb/cond_obs_fwp branch 2 times, most recently from ffe6dd8 to 018276a Compare April 22, 2025 16:31
bnb32 added 27 commits April 24, 2025 13:19
…ind model training added to examples README.
…`` class to enable passing ``xr.Dataset`` and like objects directly to ``BatchHandlers`` without needing to invoke ``.data``
…ueueing, since batches can be sampled directly if there are none in the queue).
…d dequeueing since this would cast to tensors.
… just sampling is, except for macos, but training is not.
@bnb32 bnb32 force-pushed the bnb/cond_obs_fwp branch from 666e71e to 5046c7b Compare May 6, 2025 20:48
logger.info(msg)
return total_grad, loss_details

def _get_parallel_grad(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does any of this need to be decorated with @tf.function? Am i crazy or did you remove this? If so, why?

Copy link
Collaborator Author

@bnb32 bnb32 May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a new helper function that was in run_gradient_descent before. Couldn't have a tf.function decorator since it was calling np.array_split

class Sup3rGanWithObs(Sup3rGan):
"""Sup3r GAN model which includes mid network observation fusion. This
model is useful for when production runs will be over a domain for which
observation data is available."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere, probably in this header, can you describe the nuances of training with (fake)observations vs. inference with real observations? I think that functions described with "obs for current batch" are for fake observations from training data but it's not really clear.

I also want to know how someone would pass in real observations at inference time. You don't subclass the generate() function here so i don't see anywhere that would describe how to provide real observations in a fwp.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah thanks for this feedback. Hard to get this perspective when you're "in it" for a while. Obs are used in fwp the same way as other exogenous data sources, but the ObsRasterizer should be specified (this allows NaNs to make it through the rasterization) or obs feature names should include the suffix "_obs" to automatically use the ObsRasterizer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new docs look good thanks!

step and will be converted to a list
""" # noqa : D301
if isinstance(steps, dict):
for feat, entry in steps.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole init statement is really convoluted and confusing and i'm pretty sure the entire purpose of this logic is to accomodate inputs that are not correctly formatted. Do we really need to be this flexible? I think decisions like this make things way too complex. My suggestion would be to just have a ton of assert statements in this init to make sure things are formatted correctly and then have static methods that could massage things into the correct format.

Also, nitpick, no vertical spacing for this entire block is insane. No way this is one logical group of statements.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is definitely fair. I got tired of not being able to pass exo data to generate just with a dict {'topography': nd.array(...)} - instead needing to do this nested dict described in the current doc. The nested dict was motivated by the need to work with multi step models which require a lot more complexity, but we've moved away from multi-step models a bit in favor of "multi-step" pipelines (e.g. spatial config, temporal config). Maybe this could allow us to simplify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a classic example of the challenge of having a single object do all the things? Trying to do: exo data for multiple features, multiple model steps, and multiple types of concat? Would having more atomic objects help with this? Like having a exo class for each of these options, or having a class template for a single exo layer that can be stacked by another class.

This might also be a good example of where flat is better than nested. A flat list of exo objects would be really easy to parse through and could be handled the same for a single GAN and multi-step GAN.

What if the ExoDataSingle class was just a data class with ExoDataSingle(data, fname, step_ind, ctype, model_ind=0) and then ExoData was just a list of these and they would be super easy to parse through for single in ExoData if single.attrs == desired_attrs then use this object; raise if desired_attrs not found

Sounds like maybe a good opportunity for refactor, but probably in a separate PR after we do a release with all of this PR's work. Right now just glad you simplified the init logic, thanks!

Copy link
Collaborator Author

@bnb32 bnb32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you prefer more doc info elsewhere or do you think these updates are sufficient? Also, lmk what you think on the ExoData response.

logger.info(msg)
return total_grad, loss_details

def _get_parallel_grad(
Copy link
Collaborator Author

@bnb32 bnb32 May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a new helper function that was in run_gradient_descent before. Couldn't have a tf.function decorator since it was calling np.array_split

class Sup3rGanWithObs(Sup3rGan):
"""Sup3r GAN model which includes mid network observation fusion. This
model is useful for when production runs will be over a domain for which
observation data is available."""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah thanks for this feedback. Hard to get this perspective when you're "in it" for a while. Obs are used in fwp the same way as other exogenous data sources, but the ObsRasterizer should be specified (this allows NaNs to make it through the rasterization) or obs feature names should include the suffix "_obs" to automatically use the ObsRasterizer.

step and will be converted to a list
""" # noqa : D301
if isinstance(steps, dict):
for feat, entry in steps.items():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is definitely fair. I got tired of not being able to pass exo data to generate just with a dict {'topography': nd.array(...)} - instead needing to do this nested dict described in the current doc. The nested dict was motivated by the need to work with multi step models which require a lot more complexity, but we've moved away from multi-step models a bit in favor of "multi-step" pipelines (e.g. spatial config, temporal config). Maybe this could allow us to simplify?

Copy link
Collaborator Author

@bnb32 bnb32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grantbuster Would you prefer more doc info elsewhere or do you think these updates are sufficient? Also, lmk what you think on the ExoData question.

@bnb32 bnb32 merged commit 285a9d4 into main Jun 2, 2025
12 checks passed
@bnb32 bnb32 deleted the bnb/cond_obs_fwp branch June 2, 2025 20:28
github-actions bot pushed a commit that referenced this pull request Jun 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants