boundary mask for unsupervised training #132

stmartineau99 · 2025-07-15T08:20:40Z

get_unsupervised_loader now accepts a boundary mask and a sampler to generate patches inside the mask
the raw data and mask are stacked and written out to an .h5 file, since RawDataset expects a single data file path
to avoid altering the source code of RawDataset I came up with the following solutions:

the ChannelSplitterSampler class splits the stacked data again into raw and mask to be used by the MinForegroundSampler
raw_transform is updated using the ComposedTransform class to drop the mask channel after it is no longer needed

added GPU ID argument to mean_teacher_adaptation

constantinpape

This looks very good, and should achieve exactly what we discussed! I only have a minor change request on supporting both kinds of samplers; see comments for details.

constantinpape · 2025-07-17T20:12:23Z

synapse_net/training/domain_adaptation.py

@@ -82,6 +84,10 @@ def mean_teacher_adaptation(
            based on the patch_shape and size of the volumes used for training.
        n_samples_val: The number of val samples per epoch. By default this will be estimated
            based on the patch_shape and size of the volumes used for validation.
+        train_mask_paths: Boundary masks used by the sampler to accept or reject patches for training. 


Minor semantic comment: I think that this is no necessarily a boundary mask. I think just calling it mask is more precise.

Jonathan's lamella masker uses the term "boundary mask" so that is why I used it. It makes sense because the mask defines the boundary of the signal.

Since we are using three different masks in this pipeline now (gradient mask, boundary mask, membrane mask) we need to have different words to describe them. Correct me if there is a more clear way to refer to it.

In the context here:

The "gradient mask" is computed internally only, so we don't need to expose parameters related to it here. But if you want to refer to it for some explanations then calling it "gradient mask" is good.

"boundary mask" I would call different, as we use this for accepting / rejecting samples. It does not necessarily have to be on the (spatial) boundary. (And I find the 'boundary of the signal' notion not so intuitive). I would call it "sample mask".

I would call the other mask, which you called "membrane mask", "background mask", as we use it to enforce background label in the pseudo labels. In our case this is indeed for membranes, but it could also be for other structures.

constantinpape · 2025-07-17T20:20:16Z

synapse_net/training/domain_adaptation.py

@@ -82,6 +84,10 @@ def mean_teacher_adaptation(
            based on the patch_shape and size of the volumes used for training.
        n_samples_val: The number of val samples per epoch. By default this will be estimated
            based on the patch_shape and size of the volumes used for validation.
+        train_mask_paths: Boundary masks used by the sampler to accept or reject patches for training. 
+        val_mask_paths: Boundary masks used by the sampler to accept or reject patches for     validation. 
+        sampler: Accept or reject patches based on a condition.


The samplers for the datasets and the mean teacher trainer have slightly different meaning. See also comment below. I think the best approach here is to expose and document two different sampler arguments:

patch_sampler: is passed to get_unsupervised_loader

pseudo_label_sampler: is passed to MeanTeacherTrainer

Feel free to suggest better names ;).

constantinpape · 2025-07-17T20:22:31Z

synapse_net/training/domain_adaptation.py

@@ -155,7 +172,7 @@ def mean_teacher_adaptation(
        device=device,
        reinit_teacher=reinit_teacher,
        save_root=save_root,
-        sampler=sampler,
+        sampler=None, # TODO currently set to none cause I didn't want to pass the same sampler used by get_unsupervised_loader


The sampler here is applied to the pseudo-labels predicted by the teacher, to give a criterion for rejecting pseudo labels. In contrast, the sampler passed to the loaders rejects patches based on some criterion applied to the data. It makes sense to support both and to pass them with different names; see comment above.

constantinpape · 2025-07-18T08:06:52Z

synapse_net/training/semisupervised_training.py

+
+        # update variables for RawDataset()
+        data_paths = tuple(stacked_paths)    
+        base_transform = torch_em.transform.get_raw_transform()


This should be adapted to only act on channel 0 (the actual raw data.)

Let's leave this for the next PR.

synapse_net/training/domain_adaptation.py

constantinpape · 2025-07-18T19:44:49Z

synapse_net/training/semisupervised_training.py

+
+        # update variables for RawDataset()
+        data_paths = tuple(stacked_paths)    
+        base_transform = torch_em.transform.get_raw_transform()


Let's leave this for the next PR.

constantinpape · 2025-07-18T20:12:10Z

I merged the changes @stmartineau99. Good job on these changes!

For the next PR implementing the background masking for excluding boundaries from the pseudo-labeling you should address the following:

Add the new mask paths for the background paths, name these paths accordingly.
Refactor the code here into a separate function, so that you can also cover the case where the background masks are also given. (And put this into its own function as well)
Update the base transform here so that it only acts on the first channel; otherwise the masks will be normalized as well, which doesn't make sense.
If the background mask is given, then update the augmentations here so that they are only applied to the first channel.

boundary mask for unsupervised training

4c17c0d

constantinpape reviewed Jul 17, 2025

View reviewed changes

constantinpape reviewed Jul 18, 2025

View reviewed changes

stmartineau99 added 3 commits July 18, 2025 18:12

implement lamella mask

6e865c3

boundary mask for unsupervised training

2df5a6c

boundary mask for unsupervised training

0257cdb

stmartineau99 force-pushed the main branch from 57ecbcc to 0257cdb Compare July 18, 2025 16:35

boundary mask for unsupervised training

3e454d7

constantinpape reviewed Jul 18, 2025

View reviewed changes

Update synapse_net/training/domain_adaptation.py

702138f

constantinpape approved these changes Jul 18, 2025

View reviewed changes

constantinpape merged commit 43eff47 into computational-cell-analytics:main Jul 18, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

boundary mask for unsupervised training #132

boundary mask for unsupervised training #132

Uh oh!

stmartineau99 commented Jul 15, 2025

Uh oh!

constantinpape left a comment

Uh oh!

constantinpape Jul 17, 2025

Uh oh!

stmartineau99 Jul 18, 2025 •

edited

Loading

Uh oh!

stmartineau99 Jul 18, 2025

Uh oh!

constantinpape Jul 18, 2025

Uh oh!

constantinpape Jul 17, 2025

Uh oh!

constantinpape Jul 17, 2025

Uh oh!

constantinpape Jul 18, 2025

Uh oh!

constantinpape Jul 18, 2025

Uh oh!

Uh oh!

constantinpape Jul 18, 2025

Uh oh!

Uh oh!

constantinpape commented Jul 18, 2025

Uh oh!

Uh oh!

boundary mask for unsupervised training #132

boundary mask for unsupervised training #132

Uh oh!

Conversation

stmartineau99 commented Jul 15, 2025

Uh oh!

constantinpape left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stmartineau99 Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

constantinpape commented Jul 18, 2025

Uh oh!

Uh oh!

stmartineau99 Jul 18, 2025 •

edited

Loading