Skip to content

Conversation

edyoshikun
Copy link
Contributor

This PR updates the segmentation pipline to use the Cellpose-SAM.


# Reorder channels based on channels_for_segmentation
cellpose_czyx = np.zeros(
(3, *czyx_data_to_segment.shape[1:]), dtype=czyx_data_to_segment.dtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the output datatype? If it's not stored in the same as input, it should always be integers whose width is determined automatically by cellpose. For example storing uint32 in a float32 container is lossy.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real solution here would be to save integer labels in a separate labels array in the zarr store. For now we save everything in one float32 array, which is not ideal. I'd suggest casting cellpose_czyx to float32 and leaving a note that we should fix that later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant if the output store is not the same as input it should be integers. There is no reason for a store that only have integer arrays to be initialized with float32 data type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And since we do a lot of 2D segmentation, they wouldn't be stored with the 3D input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can default it to uint32.

@ieivanov
Copy link
Collaborator

@edyoshikun are you happy with this PR? Is it ready to merge? I understand this introduces a breaking change in the segmentation config, which we can coordinate with @tayllatheodoro - I think currently segmentation is not part of the processing pipeline, through it probably should be.


# Estimate resources
num_cpus, gb_ram_request = estimate_resources(shape=segmentation_shape, ram_multiplier=20)
num_cpus, gb_ram_request = estimate_resources(shape=segmentation_shape, ram_multiplier=10)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a lot of funny math to estimate the resources. Does estimate_resources not do everything you need? For example, it now allows for specifying the max number of CPUs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest leaving reasonable defaults in the code and using an sbatch file to tune these parameters as needed for specific reconstructions or depending on the cluster usage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same as before. We just dont use the CPUs that much.

input_channel_indices=[list(range(C))],
output_channel_indices=[list(range(C_segment))],
num_processes=np.min([20, int(num_cpus * 0.8)]),
num_processes=np.min([slurm_array_parallelism, int(num_cpus * 0.8)]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of processes here should be decoupled from the slurm_array_parallelism?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tricky because this depends on the input dataset.

@edyoshikun
Copy link
Contributor Author

I couldn't immediately figure out why when we run it locally, it doens't seem to find the GPU devices and defaults to cpu, which is super slow for cellpose.

@ieivanov
Copy link
Collaborator

ieivanov commented Aug 8, 2025

@edyoshikun let's discuss the resource allocation in the PR and then we can merge it if you're happy with it.

@edyoshikun
Copy link
Contributor Author

There is a caveat right now that this won't work with the Neuromast segmentations because there is no fine-tuned model for the Neuromast with CP4 yet.

@mattersoflight mattersoflight added this to the Advanced Analysis milestone Aug 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants