chipseq-smk-pipeline

Snakemake based pipeline for ChIP-seq and ATAC-seq datasets processing from raw data QC and alignment to visualization and peak calling.

During peak calling steps chipseq-smk-pipeline automatically matches signal with control file by names proximity.

Input

Input FASTQ files

Pipeline aligned FASTQ or gzipped FASTQ reads, defined in config.yaml.
Reads folder is a relative path in pipeline working directory and defined by fastq_dir property.
FASTQ reads extension is defined by fastq_ext property, e.g. could be fq, fq.gz, fastq, fastq.gz.

Input BAM files

Use start_with_bams=True config option to start with existing bam files.
Pipeline starts with BAM files in work_dir/bams folder.

Files

Path	Description
`config.yaml`	Default pipeline options
`trimmed`	Trimmed FASTQ file, if `trim_reads` option is True.
`bams`	BAMs with aligned reads, `MAPQ >= 30`
`bw`	BAM coverage visualization using DeepTools
`<peak_caller_name>`	Peaks provided by peak caller tool `<peak_caller_name>`
`qc`	QC Reports
`multiqc`	MultiQC reports for different steps
`logs`	Shell commands logs

Requirements

The pipeline requires conda.

If conda is not installed, follow the instructions at Conda website.
Navigate to repository directory.

Create a Conda environment for snakemake:

$ conda env create --file environment.yaml --name snakemake

Activate the newly created environment:

$ source activate snakemake

On Ubuntu please ensure that gawk is installed:

$ sudo apt-get install gawk

Launch

Run the pipeline to start with fastq reads:

$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
    all [--cores <cores>] --use-conda --directory <work_dir> \
    --config fastq_dir=<fastq_dir> genome=<genome> --rerun-incomplete

The Default pipeline doesn't perform coverage visualization and launch peak callers.
Please add bw=True, <peak_caller_name>=True to create coverage bw files and call peaks with <peak_caller_name>.

See config.yaml for a complete list of parameters. Use--config to override default options from config.yaml file.

Peak callers

Supported peak caller tools:

To launch MACS2 in --broad mode, use the following config:

$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
    all [--cores <cores>] --use-conda --directory <work_dir> \
    --config fastq_dir=<fastq_dir> genome=<genome> \
    macs2=True macs2_mode=broad macs2_params="--broad --broad-cutoff 0.1" macs2_suffix=broad0.1 \
    --rerun-incomplete

Peak callers installation

This section contains instructions for manual peak callers installation.

BayesPeak

Install R

mamba install  -c conda-forge r-base=3.6.3

In R console

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(version = "3.10")  # Explicitly set correct Bioconductor version
BiocManager::install(c("IRanges", "GenomicRanges"))

Install BayesPeak

wget https://www.bioconductor.org/packages//2.10/bioc/src/contrib/BayesPeak_1.8.0.tar.gz
R CMD INSTALL BayesPeak_1.8.0.tar.gz

Hotspot

Install required dependencies

sudo apt-get install build-essential libgsl-dev

Download and make

wget https://github.com/StamLab/hotspot/archive/refs/tags/v4.1.1.zip
gunzip v4.1.1.zip
cd hotspot-4.1.1/hotspot-distr/hotspot-deploy
make

PeakSeq
Download and make

git clone https://github.com/gersteinlab/PeakSeq.git
cd PeakSeq
make

Rules

Rules DAG produced with additional command line arguments --forceall --rulegraph | dot -Tpdf > rules.pdf

Computational cluster QSUB/LFS/QSUB

Configure profile for required cluster system with name cluster.

$ mkdir -p ~/.config/snakemake
$ cd ~/.config/snakemake
$ cookiecutter https://github.com/iromeo/generic.git

Example of ATAC-Seq processing on qsub

$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
    all --use-conda --directory <work_dir> \
    --profile cluster --cluster-config cluster_config.yaml --jobs 150 \
    --config fastq_dir=<fastq_dir> genome=<genome> \
    bowtie2_params="-X 2000 --dovetail" \
    macs2=True macs2_params="-q 0.05 -f BAMPE --nomodel --nolambda -B --call-summits" \
    omnipeak=True omnipeak_fragment=0 --rerun-incomplete

P.S: Use --config to override default options from config.yaml file

Try with test data

Please download example fastq.gz files from CD14_chr15_fastq folder.
These files are filtered on human hg19 chr15 to reduce size and make computations faster.

Launch chipseq-smk-pipeline:

$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
    all --use-conda --cores all --directory <work_dir> \
    --config fastq_ext=fastq.gz fastq_dir=<work_dir> bw=True genome=hg19 macs2=True sicer=True omnipeak=True \
    --rerun-incomplete

Useful links

Learn more about Snakemake workflow management system
Developed with SnakeCharm plugin for PyCharm IDE by JetBrains Research BioLabs

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
envs		envs
rules		rules
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile
cluster_config.yaml		cluster_config.yaml
config.yaml		config.yaml
environment.yaml		environment.yaml
pipeline.png		pipeline.png
pipeline_util.py		pipeline_util.py
rules.png		rules.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

chipseq-smk-pipeline

Input

Files

Requirements

Launch

Peak callers

Peak callers installation

Rules

Computational cluster QSUB/LFS/QSUB

Try with test data

Useful links

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

JetBrains-Research/chipseq-smk-pipeline

Folders and files

Latest commit

History

Repository files navigation

chipseq-smk-pipeline

Input

Files

Requirements

Launch

Peak callers

Peak callers installation

Rules

Computational cluster QSUB/LFS/QSUB

Try with test data

Useful links

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages