Snakemake based pipeline for ChIP-seq and ATAC-seq datasets processing from raw data QC and alignment to visualization and peak calling.
During peak calling steps chipseq-smk-pipeline
automatically matches signal with control file by names proximity.
Input FASTQ files
Pipeline aligned FASTQ or gzipped FASTQ reads, defined in config.yaml
.
Reads folder is a relative path in pipeline working directory and defined by fastq_dir
property.
FASTQ reads extension is defined by fastq_ext
property, e.g. could be fq
, fq.gz
, fastq
, fastq.gz
.
Input BAM files
Use start_with_bams=True
config option to start with existing bam files.
Pipeline starts with BAM
files in work_dir/bams
folder.
Path | Description |
---|---|
config.yaml |
Default pipeline options |
trimmed |
Trimmed FASTQ file, if trim_reads option is True. |
bams |
BAMs with aligned reads, MAPQ >= 30 |
bw |
BAM coverage visualization using DeepTools |
<peak_caller_name> |
Peaks provided by peak caller tool <peak_caller_name> |
qc |
QC Reports |
multiqc |
MultiQC reports for different steps |
logs |
Shell commands logs |
The pipeline requires conda
.
- If
conda
is not installed, follow the instructions at Conda website. - Navigate to repository directory.
Create a Conda environment for snakemake
:
$ conda env create --file environment.yaml --name snakemake
Activate the newly created environment:
$ source activate snakemake
On Ubuntu please ensure that gawk
is installed:
$ sudo apt-get install gawk
Run the pipeline to start with fastq reads:
$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
all [--cores <cores>] --use-conda --directory <work_dir> \
--config fastq_dir=<fastq_dir> genome=<genome> --rerun-incomplete
The Default pipeline doesn't perform coverage visualization and launch peak callers.
Please add bw=True
, <peak_caller_name>=True
to create coverage bw files and call peaks with <peak_caller_name>
.
See config.yaml
for a complete list of parameters. Use--config
to override default options from config.yaml
file.
Supported peak caller tools:
To launch MACS2 in --broad
mode, use the following config:
$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
all [--cores <cores>] --use-conda --directory <work_dir> \
--config fastq_dir=<fastq_dir> genome=<genome> \
macs2=True macs2_mode=broad macs2_params="--broad --broad-cutoff 0.1" macs2_suffix=broad0.1 \
--rerun-incomplete
This section contains instructions for manual peak callers installation.
-
BayesPeak
- Install R
mamba install -c conda-forge r-base=3.6.3
- In R console
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(version = "3.10") # Explicitly set correct Bioconductor version BiocManager::install(c("IRanges", "GenomicRanges"))
- Install BayesPeak
wget https://www.bioconductor.org/packages//2.10/bioc/src/contrib/BayesPeak_1.8.0.tar.gz R CMD INSTALL BayesPeak_1.8.0.tar.gz
-
Hotspot
- Install required dependencies
sudo apt-get install build-essential libgsl-dev
- Download and make
wget https://github.com/StamLab/hotspot/archive/refs/tags/v4.1.1.zip gunzip v4.1.1.zip cd hotspot-4.1.1/hotspot-distr/hotspot-deploy make
-
PeakSeq
Download and makegit clone https://github.com/gersteinlab/PeakSeq.git cd PeakSeq make
Rules DAG produced with additional command line arguments --forceall --rulegraph | dot -Tpdf > rules.pdf
Configure profile for required cluster system with name cluster
.
$ mkdir -p ~/.config/snakemake
$ cd ~/.config/snakemake
$ cookiecutter https://github.com/iromeo/generic.git
Example of ATAC-Seq processing on qsub
$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
all --use-conda --directory <work_dir> \
--profile cluster --cluster-config cluster_config.yaml --jobs 150 \
--config fastq_dir=<fastq_dir> genome=<genome> \
bowtie2_params="-X 2000 --dovetail" \
macs2=True macs2_params="-q 0.05 -f BAMPE --nomodel --nolambda -B --call-summits" \
omnipeak=True omnipeak_fragment=0 --rerun-incomplete
P.S: Use --config
to override default options from config.yaml
file
Please download example fastq.gz
files
from CD14_chr15_fastq folder.
These files are filtered on human hg19 chr15 to reduce size and make computations faster.
Launch chipseq-smk-pipeline
:
$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
all --use-conda --cores all --directory <work_dir> \
--config fastq_ext=fastq.gz fastq_dir=<work_dir> bw=True genome=hg19 macs2=True sicer=True omnipeak=True \
--rerun-incomplete
- Learn more about Snakemake workflow management system
- Developed with SnakeCharm plugin for PyCharm IDE by JetBrains Research BioLabs