Process for merging both technical and biological replicates. This pipeline will also analyze peaks from replicate samples of ChipSeq or Cut&Run Experiment to produce heat plots and euler plots.
Green Global Travel. (2021). Mount Everest, Nepal/Tibet [Photograph]. Retrieved from https://greenglobaltravel.com/tallest-mountains-in-the-world/
- Merge replicate treatment and input downsampled bams.
- Call peaks on merged bams. These are the merged peaks.
- Identify merged that overlap peaks in each sample.
- Make a heat plot showing coverage of all merged peaks with merged data.
- Make a heat plot showing coverage of all merged peaks for each sample.
- Make a Euler plot showing overlap of the sample peaks.
- Make an Upset plot showing overlap of the sample peaks.
- Make a report.
- List of .bam files
- Merged .bam files
- Heat map of braod peaks
- Heat map of narrow peaks
- Euler plot of peaks overlapped
# make sure no modules are already loaded
module purge
# load moduels to run snakemake
module load slurm python/3.7.0 pandas/1.0.3 numpy/1.18.2
# once you are in the directory you would like to work clone the github repo
git clone git@github.com:kevinboyd76/ReplicatePeakAnalyzer.git
# Step below to rename the folder are not necessary but can help organize your files
# rename folder with project name
mv ReplicatePeakAnalyzer/ My_Project_Folder/
# change directory into root of your project folder
# You will need to be in this directory to run the snakefile
cd My_Project_Folder/
Note. Make sure to rename sample file by removing "_template"
- Open the "samples.csv" file located in the "/config" folder.
- In the first column labeled "sample", enter a unique name for each sample. These names will be used to identify the samples and establish the naming convention for the samples going forward.
- In the second column labeled "treatmentBam", replace the existing filenames and paths with the names and paths of your desired treatment BAM files.
- In the third column labeled "inputBam", replace the existing filenames and paths with the names and paths of your desired input BAM files.
- Fill the fourth column labeled "set" with an appropriate indicator for each sample. This indicator will determine the sample's set membership. Samples with the same indicator in the "set" column will be considered part of the same set and are used to generate a consensus peak set. The desired number of samples with overlapping peaks to generate a consensus peakset is specified in the "config.yml" file.
sample | treatmentBam | inputBam | set |
---|---|---|---|
testData1A | resources/testData/test1.bam | resources/testData/input1.bam | testSet1 |
testData1B | resources/testData/test1B.bam | resources/testData/input1.bam | testSet1 |
testData2A | resources/testData/test2.bam | resources/testData/input2.bam | testSet2 |
testData2B | resources/testData/test2B.bam | resources/testData/input2B.bam | testSet2 |
CPU and memory requests for each rule in the pipeline are detailed in this file. If you are using SLURM, you may need to alter this file to fit your needs/system.
A dry run produces a text output showing exactly what commands will be executed. Look this over carefully before submitting the full job. It is normal to see warnings about changes made to the code, input, and params
snakemake -npr
sbatch --constraint=westmere \
--wrap="\
snakemake \
-R \
-p \
-j 999 \
--use-envmodules \
--latency-wait 100 \
--cluster-config config/cluster_config.yml \
--cluster '\
sbatch \
-A {cluster.account} \
-p {cluster.partition} \
--cpus-per-task {cluster.cpus-per-task} \
--mem {cluster.mem} \
--output {cluster.output} \
--error {cluster.error} \
--time {cluster.time}'"
Midpoints | PeakSummits |
---|---|
![]() |
![]() |
Visualization of calling peaks based on peaks found in either 2 or 3 replicates
Parameters:
Parameter | Description | Value |
---|---|---|
MACS_q_value | q-value (minimum FDR) cutoff for macs peak calling | (Integer) |
broad_peaks | run macs with broad peaks or not | True or False |
macs2 callpeak \
-t {input.txBam} \
-c {input.inBam} \
-f BAMPE \
-g {params.effective_genome_size} \
-n {params.sample_name}_{params.minimum_FDR_cutoff} \
-q {params.minimum_FDR_cutoff} \
--outdir results/macs2_normalPeaks/
samtools idxstats {input.txBam} | awk -F '\t' '{s+=$3}END{print s}' > {output.treatmentCountFile}
samtools idxstats {input.inBam} | awk -F '\t' '{s+=$3}END{print s}' > {output.inputCountFile}
cat