Skip to content

Prototype example #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 93 commits into
base: dev-0.1
Choose a base branch
from
Open

Conversation

wli51
Copy link
Collaborator

@wli51 wli51 commented Feb 18, 2025

This is a self-contained-ish minimal example of trainer, callbacks and models for the sake of demo and collecting suggestion purpose. The actual PRs should probably be smaller.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@MattsonCam MattsonCam self-requested a review February 19, 2025 23:54
wli51 added 15 commits March 1, 2025 23:36
…ermination metric is supplied the trainer runs for the specified amount of epochs.
…ted on a subset of the target prediciton pairs.
…lected input and target patches/images from dataset along sid emodel prediction. The new plot functions include one, plot_predictions_grid_from_eval, that can operate down stream of existing inference/evaluation results to avoid redundant forward passes. A different plot function, plot_predictions_grid_from_model, internally performs inference and evluation and visualizes results from trained model and dataset to enable visualization without the need for beforehand inference and evaluation.
…upports both PatchDataset and standard ImageDataset
@MattsonCam MattsonCam self-requested a review March 3, 2025 22:40
@MattsonCam
Copy link
Member

Hey @wli51, I responded to your responses. Let me know if you want to leave other responses. Otherwise, I will give my closing remarks (and my decision for this pr).

Copy link
Member

@MattsonCam MattsonCam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This first iteration LGTM, good job @wli51! I think there are some changes we will want to make in future prs. I added additional comments to help guide some of these changes. Let me know if you have any questions, and when you want to meet


self.__dataset = dataset

self.__cache_size = cache_size if cache_size is not None else len(dataset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there only a speedup if every sample can fit into memory? If len(dataset) = N and you can only fit x samples into memory, then couldn't you just remove and then add N-x samples each epoch from memory? I think this would improve substantially if there is enough memory

raise ValueError("No current index set")

@property
def input_channel_keys(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user should have some idea of how the data is organized. I think 2 would take too much time (especially going the deep learning route), so we probably want the user to perform this type of data splitting/preprocessing. I think 1 sounds like a better choice. Also, I'm leaning towards the idea of allowing the user to specify which samples belong in each datasplit (having one folder per split is one option)


return torch.from_numpy(input_images).float(), torch.from_numpy(target_images).float()

def _cache_image(self, site_id: str) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be an option. There are probably other ways you could combine them as well. Agreed, probably not a top priority but something worth considering

inputs=interpolated,
grad_outputs=torch.ones_like(prob_interpolated),
create_graph=True,
retain_graph=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, in that case I would just keep it

self,
model: torch.nn.Module,
optimizer: torch.optim.Optimizer,
backprop_loss: Union[torch.nn.Module, List[torch.nn.Module]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of this could probably also be explained in the README to add clarity.

I do wonder how we would balance the clarity with redundancy of the dataset related initializations? Should we have a AbstractTrainer that branches off to AbstractGANTrainer and AbstractConvNetTrainer?

Having two branching abstract trainers may work for our purposes. This software will need to have constraints (e.g. the user won't be able to train all possible virtual staining model). Otherwise, the user would need to use a framework, like pytorch, to create their own models and training procedures. Alternatively, we don't want a weak base class in case we want to train new models or the same models in a different way. Sometimes the code just needs to be good enough in the case we want to make any changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants