A Nextflow pipeline for Hi-C data analysis using cooltools, focusing on expected interactions, insulation scores, eigenvectors, and saddle plots.
This pipeline processes .mcool
files to generate various Hi-C analysis outputs including:
- Expected interaction frequencies
- Insulation scores
- Compartment analysis (eigenvectors)
- Saddle plots
- Nextflow (>=21.04.0)
- Docker or Conda
- Input
.mcool
files
- Clone the repository:
git clone https://github.com/gspracklin/puddingstone-nf
cd puddingstone-nf
- Choose your execution environment:
- Docker (recommended)
- Conda environment
- Singularity
Basic usage:
nextflow run main.nf -profile docker
docker
: Runs pipeline in Docker containerstandard
: Runs pipeline locallyslurm
: Runs pipeline on SLURM clustersingularity
: Runs pipeline using Singularity
Configure your analysis in project.yml
:
cooler: './path/to/*.mcool' # Input cooler files
expected:
expectedOptions: "-p 2" # Options for expected calculation
resolution: 20000 # Resolution in base pairs
window: 100000 # Window size for insulation score
threads: 4 # Number of threads
results/expected/
: Expected interaction frequenciesresults/insulation/
: Insulation scores and boundariesresults/eigenvectors/
: Compartment analysis resultsresults/saddle/
: Saddle plots
Calculates the expected interaction frequencies as a function of genomic distance.
Computes insulation scores and identifies domain boundaries.
Performs compartment analysis using eigenvector decomposition.
Generates saddle plots to visualize compartment strength.
- Default: 4 CPUs
- Memory: Scales with input file size
- Storage: ~2-3x input file size
For issues and questions, please open an issue on GitHub.
If you use this pipeline, please cite:
- cooltools
- This pipeline
This project is licensed under MIT License.