Analysis suite for light-sheet fluorescence microscopy images

Project overview

This project is a collection of tools for analyzing light-sheet fluorescence microscopy images.

Quickstart

Copy and edit a config:

cp config_template.yaml config.yaml
# edit config.yaml

Dry run:

snakemake -n -p --profile profiles/sge --configfile config.yaml

Run:

snakemake -p --profile profiles/sge --configfile config.yaml

You can also use the wrapper:

./run_pipeline.sh -n                    # dry run
./run_pipeline.sh --until rechunk_to_blocks
./run_pipeline.sh --configfile config.yaml

Environment Setup and job configuration

To run this pipeline, you will first need to clone the repo:

$ git clone https://github.com/beliveau-lab/lightsheet-analysis-pipeline

We will assume that the raw dataset is stored on disk in .h5 format and has a corresponding .xml metadata file.

Next, we will set up the config.yaml file. First specify the input directory and .xml path:

Input/Output Settings (from config_template.yaml)

input_xml: The path to the XML metadata file (e.g., /../../dataset.xml). Make sure you have write permissions on this file! If not, you can make a copy.
output_dir: The folder to output resulting files.

Next, specify whether you want to reorient the sample. This currently applies the "SHIFT-Y" transformation from BigDataViewer.

#TODO: add custom transform matrix as an option

reorient_sample:
- enabled : Whether to reorient the sample in SHIFT-Y view. Values are True or False.

Next, we need to specify the BigStitcher-Spark installation directory:

BigStitcher-Spark Settings (from config.yaml)

bigstitcher_script_dir: The location of the BigStitcher-Spark scripts needed to run the alignment process.

The major steps of the alignment process are as follows:

Pairwise stitching
Global optimization
Affine fusion

Each step relies on an Apache Spark cluster for parallel processing. While you can specify a cluster configuration per step, I find that the default setup works well for all steps. Source documentation can be found at:

https://github.com/JaneliaSciComp/BigStitcher-Spark/blob/main/README.md

Spark Cluster Configuration (from config.yaml)

runtime: Maximum time allocated for the alignment job (Format: HH:MM).
n_nodes: Number of nodes to use in the cluster (1 node = 1 job).
spark_log_base_dir: The directory where logs (records of the process) from Spark will be saved.
spark_job_timeout: How long (in seconds) the system should wait for Spark job logs before timing out.
cluster: Specific settings for how Spark distributes the work:
- executors_per_node: Number of parallel processes (executors) to run on each node.
- cores_per_executor: Number of processor cores assigned to each executor.
- overhead_cores_per_worker: Extra cores reserved on each node for system tasks.
- tasks_per_executor_core: How many small tasks each core within an executor can handle concurrently.
- cores_driver: Number of cores dedicated to the main coordinating process (driver).
- gb_per_slot: Amount of memory (in gigabytes) allocated per processing slot. Note: This might be related to older cluster managers.
- ram_per_core: Amount of memory allocated for each processor core (e.g., "12G").
- project: The SGE project to submit jobs to.
- queue: The specific queue to submit the job to (optional).

Stitching Settings (from config.yaml)

This step finds the spatial relationships between adjacent image tiles.

stitching_channel: The image channel (e.g., nuclear stain) used to find overlaps between tiles. Channel numbers usually start from 0.
min_correlation: A threshold (0 to 1) indicating how similar overlapping regions must be to be considered a match. Higher values mean stricter matching.

Global Optimization

This step refines the positions of all tiles simultaneously based on the stitching results to create the most accurate overall alignment.

TODO: Implement error threshold parameters

Fusion (from config.yaml)

This step combines the aligned tiles into a single output image file.

channels: A list of the image channels to include in the final fused image (e.g., [0, 1, 2]).
block_size: The size of the data chunks (in pixels: X, Y, Z) used when writing the output file (e.g., "512,512,512"). This affects performance and compatibility with downstream tools.
intensity: Defines the range of pixel brightness values in the output.
- min: The minimum intensity value.
- max: The maximum intensity value (e.g., 65535 for 16-bit images).
data_type: The numerical format for storing pixel brightness (e.g., UINT16 for 16-bit unsigned integers).

The fused output is saved in the input directory as a multichannel, multiresolution .n5 container.

Distributed Processing (Dask Configuration from `config.yaml`)

For tasks like segmentation and feature extraction that can be parallelized, the Dask library is used to manage computation across multiple workers (potentially using GPUs for acceleration).

dask: Container for Dask settings.
- log_dir: Directory where Dask worker logs are saved.
- runtime: Maximum runtime for Dask worker jobs.
- dashboard_port: Address for the Dask dashboard.
- gpu_worker_config: Settings for GPU workers
  - num_workers, processes, threads_per_worker, memory, cores, project, queue, resource_spec
- cpu_worker_config: Settings for CPU workers
  - num_workers, processes, threads_per_worker, memory, cores, project, queue, resource_spec

Segmentation (from `config.yaml`)

This step runs dask-distributed nuclear instance segmentation with Cellpose.

segmentation: Container for segmentation settings.
- output_suffix: Text added to the input filename to create the output segmentation filename (e.g., "_segmented_normalized.zarr").
- block_size: Processing chunk size (Z, Y, X) for the segmentation algorithm.
- eval_kwargs: Specific parameters passed directly to the segmentation model (e.g., Cellpose eval function), controlling thresholds, batch sizes, etc.
- cellpose_model_path: Path to the pre-trained Cellpose model used for segmentation.
- n5_channel_path: Specifies the path within the fused N5/Zarr file to the specific channel data that will be used as input for segmentation (e.g., "ch2/s0" means channel 2, scale 0).

De-striping (from `config.yaml`)

destripe: Container for destripe settings.
- output_suffix: Text added to the input filename to create the output feature extraction filename (e.g., "_destriped.zarr").
- channels: List of channels to run destriping on (e.g. [0,1,2]).
- n5_path_pattern: A template defining how to access data for different channels within the N5/Zarr file (e.g., "ch{}/s0", where {} is replaced by the channel number).
- block_size: Output chunk size (Z, Y, X) for the resulting .zarr arrays.

Feature Extraction (from `config.yaml`)

After segmentation, this step measures various properties (features) of the segmented objects (e.g., size, shape, intensity in different channels).

feature_extraction: Container for feature extraction settings.
- output_suffix: Text added to the input filename to create the output feature extraction filename (e.g., "_features.csv").
- channels: List of channels from which to extract features for each segmented object.
- n5_path_pattern: A template defining how to access data for different channels within the N5/Zarr file (e.g., "ch{}/s0", where {} is replaced by the channel number).
- batch_size: Number of cells to process in a single batch (to control peak memory usage)

Running the Pipeline

After configuring the .yaml, you can begin execution of the Snakemake workflow with:

$ snakemake -p --profile profiles/sge --configfile config.yaml

Environments are managed automatically via --use-conda in the profile. If you prefer manual control, create the env from workflow/envs/*.yml.

To perform a dry run of the pipeline, you can modify the submit_snakemake.sh script to run snakemake with the -n flag.

Spark logs: {bigstitcher_script_dir}/logs/spark. Dask logs: dask.log_dir. Snakemake logs: see logs/ under the output directory.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
figures		figures
profiles/sge		profiles/sge
scripts		scripts
tests		tests
workflow/envs		workflow/envs
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
Snakefile		Snakefile
config_template.yaml		config_template.yaml
dag.png		dag.png
pytest.ini		pytest.ini
run_pipeline_template.sh		run_pipeline_template.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Analysis suite for light-sheet fluorescence microscopy images

Project overview

Quickstart

Environment Setup and job configuration

Distributed Processing (Dask Configuration from `config.yaml`)

Segmentation (from `config.yaml`)

De-striping (from `config.yaml`)

Feature Extraction (from `config.yaml`)

Running the Pipeline

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

beliveau-lab/lightsheet-analysis-pipeline

Folders and files

Latest commit

History

Repository files navigation

Analysis suite for light-sheet fluorescence microscopy images

Project overview

Quickstart

Environment Setup and job configuration

Distributed Processing (Dask Configuration from `config.yaml`)

Segmentation (from `config.yaml`)

De-striping (from `config.yaml`)

Feature Extraction (from `config.yaml`)

Running the Pipeline

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages