-
Notifications
You must be signed in to change notification settings - Fork 2
Updating the segmentation to use Cellpose SAM #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
||
# Reorder channels based on channels_for_segmentation | ||
cellpose_czyx = np.zeros( | ||
(3, *czyx_data_to_segment.shape[1:]), dtype=czyx_data_to_segment.dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the output datatype? If it's not stored in the same as input, it should always be integers whose width is determined automatically by cellpose. For example storing uint32 in a float32 container is lossy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real solution here would be to save integer labels in a separate labels
array in the zarr store. For now we save everything in one float32 array, which is not ideal. I'd suggest casting cellpose_czyx to float32 and leaving a note that we should fix that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant if the output store is not the same as input it should be integers. There is no reason for a store that only have integer arrays to be initialized with float32 data type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And since we do a lot of 2D segmentation, they wouldn't be stored with the 3D input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can default it to uint32.
@edyoshikun are you happy with this PR? Is it ready to merge? I understand this introduces a breaking change in the segmentation config, which we can coordinate with @tayllatheodoro - I think currently segmentation is not part of the processing pipeline, through it probably should be. |
|
||
# Estimate resources | ||
num_cpus, gb_ram_request = estimate_resources(shape=segmentation_shape, ram_multiplier=20) | ||
num_cpus, gb_ram_request = estimate_resources(shape=segmentation_shape, ram_multiplier=10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a lot of funny math to estimate the resources. Does estimate_resources
not do everything you need? For example, it now allows for specifying the max number of CPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest leaving reasonable defaults in the code and using an sbatch file to tune these parameters as needed for specific reconstructions or depending on the cluster usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the same as before. We just dont use the CPUs that much.
biahub/segment.py
Outdated
input_channel_indices=[list(range(C))], | ||
output_channel_indices=[list(range(C_segment))], | ||
num_processes=np.min([20, int(num_cpus * 0.8)]), | ||
num_processes=np.min([slurm_array_parallelism, int(num_cpus * 0.8)]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of processes here should be decoupled from the slurm_array_parallelism
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tricky because this depends on the input dataset.
I couldn't immediately figure out why when we run it locally, it doens't seem to find the GPU devices and defaults to cpu, which is super slow for cellpose. |
@edyoshikun let's discuss the resource allocation in the PR and then we can merge it if you're happy with it. |
There is a caveat right now that this won't work with the Neuromast segmentations because there is no fine-tuned model for the Neuromast with CP4 yet. |
This PR updates the segmentation pipline to use the Cellpose-SAM.