This is forked from markgene/chipseq, which itself is forked from nf-core/chipseq configed for running on SJ HPC. This repo focuses on CUT&RUN instead of regular ChIP-seq.
markgene/cutnrun is a bioinformatics analysis pipeline used for CUT&RUN data.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
- Raw read QC (
FastQC) - Adapter trimming (
Trim Galore!) - Alignment (
Bowtie2) - Mark duplicates (
picard) - Merge alignments from multiple libraries of the same sample (
picard)- Re-mark duplicates (
picard) - Filtering to remove:
- reads mapping to blacklisted regions (
SAMtools,BEDTools) - reads that are marked as duplicates (
SAMtools) - reads that arent marked as primary alignments (
SAMtools) - reads that are unmapped (
SAMtools) - reads that map to multiple locations (
SAMtools) - reads containing > 4 mismatches (
BAMTools) - reads that have an insert size > 2kb (
BAMTools; paired-end only) - reads that map to different chromosomes (
Pysam; paired-end only) - reads that arent in FR orientation (
Pysam; paired-end only) - reads where only one read of the pair fails the above criteria (
Pysam; paired-end only)
- reads mapping to blacklisted regions (
- Alignment-level QC and estimation of library complexity (
picard,Preseq) - Create normalised bigWig files scaled to 1 million mapped reads (
BEDTools,bedGraphToBigWig) - Generate gene-body meta-profile from bigWig files (
deepTools) - Calculate genome-wide IP enrichment relative to control (
deepTools) - Calculate strand cross-correlation peak and ChIP-seq quality measures including NSC and RSC (
phantompeakqualtools) - Call broad/narrow peaks (
MACS2) - Annotate peaks relative to gene features (
HOMER) - Create consensus peakset across all samples and create tabular file to aid in the filtering of the data (
BEDTools) - Count reads in consensus peaks (
featureCounts) - Differential binding analysis, PCA and clustering (
R,DESeq2)
- Re-mark duplicates (
- Create IGV session file containing bigWig tracks, peaks and differential sites for data visualisation (
IGV). - Present QC for raw read, alignment, peak-calling and differential binding results (
MultiQC,R)
i. Install nextflow
ii. Install one of docker, singularity or conda
iii. Download the pipeline and test it on a minimal dataset with a single command
nextflow run markgene/cutnrun -profile test,<docker/singularity/conda/institute>Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-profile institutein your command. This will enable eitherdockerorsingularityand set the appropriate execution settings for your local compute environment.
iv. Start running your own analysis!
nextflow run nf-core/chipseq -profile <docker/singularity/conda/institute> --input design.csv --genome GRCh37See usage docs for all of the available options when running the pipeline.
The markgene/cutnrun pipeline comes with documentation about the pipeline, found in the docs/ directory:
- Installation
- Pipeline configuration
- Running the pipeline
- Output and how to interpret the results
- Troubleshooting
The workflow was originally forked from nf-core/chipseq. I modify the codes to make it fit better for CUT&RUN data.