Skip to content
friedue edited this page Jan 18, 2014 · 16 revisions

HOME > About deepTools

Why we built deepTools

  • no were programs available
  • QC, normalization, visualization
  • highly customizable images (change colours, size, labels, file format etc.)
  • enable very individual down-stream analyses - prerequisite: have access to every data set that is being produced
  • modular approach - compatibility, flexibility, scalability

How we use deepTools

The majority of samples that we handle within our facility come from ChIP-seq experiments, therefore you will find many examples from ChIP-seq analyses. This does not mean that deepTools is restricted to ChIP-seq data analysis, but some tools, such as bamFingerprint specifically address ChIP-seq-issues. (That being said, we do process quite a bit of RNA-seq, other -seq and genomic sequencing data using deepTools, too.)

Here are slides that we used for teaching at the University of Freiburg.

As depicted in the figure down below, our work usually begins with one or more FASTQ file(s) of deeply-sequenced samples. After a first quality control using FASTQC, we align the reads to the reference genome, e.g. using bowtie2. We then use deepTools to assess the quality of the aligned reads:

  1. Correlation between BAM files (bamCorrelate). This is a very basic test to see whether the sequenced and aligned reads meet your expectations. We use this check to assess the reproducibility - either between replicates and/or between different experiments that might have used the same antibody/the same cell type etc. For instance, replicates should correlate better than differently treated samples.
  2. GC bias check (computeGCbias). Many sequencing protocols require several rounds of PCR-based amplification of the DNA to be sequenced. Unfortunately, most DNA polymerases used for PCR introduce significant GC biases as they prefer to amplify GC-rich templates. Depending on the sample (preparation), the GC bias can vary significantly and we routinely check its extent. In case we need to compare files with different GC biases, we use the correctGCbias module to match the GC bias. See the paper by [Benjamini and Speed][] for many insights into this problem.
  3. Assessing the ChIP strength. This is a QC we do to get a feeling for the signal-to-noise ratio in samples from ChIP-seq experiments. It is based on the insights published by [Diaz et al.][].

Once we're satisfied by the basic quality checks, we normally convert the large BAM files into a leaner data format, typically bigWig. bigWig files have several advantages over BAM files that mainly stem from their significantly decreased size:

  • useful for data sharing & storage
  • intuitive visualization in Genome Browsers (e.g. UCSC Genome Browser, IGV)
  • more efficient downstream analyses are possible

The deepTools modules bamCompare and bamCoverage do not only allow the simple conversion from BAM to bigWig (or bedGraph for that matter), the main reason why we developed those tools was that we wanted to be able to normalize the read coverages so that we could compare different samples despite differences in sequencing depth, GC biases and so on.

Finally, once all the files have passed our visual inspections, the fun of downstream analyses with heatmapper and profiler can begin!

Here's a visual summary of our average workflow - deepTools modules are indicated in bold letters, alternative software such as FASTQC and bowtie are noted in regular font. Everything written in red is related to quality control (QC) of the samples.

flowChartI

deepTools overview

deepTools consists of a set of modules that can be used independently to work with mapped reads. We have subdivided such tasks into quality controls, normalizations and visualizations.

Here's a concise summary of the tools - if you would like more detailed information about the individual tools and example figures, follow the links in the table.

tool type input files main output file(s) application
[bamCorrelate][] QC 2 or more BAM clustered heatmap Pearson or Spearman correlation between read distributions
[bamFingerprint][] QC 2 BAM 1 diagnostic plot assess enrichment strength of a ChIP sample
[computeGCBias][] QC 1 BAM 2 diagnostic plots calculate the exp. and obs. GC distribution of reads
[bamCoverage][] normalization BAM bedGraph or bigWig obtain the normalized read coverage of a single BAM file
[bamCompare][] normalization 2 BAM bedGraph or bigWig normalize 2 BAM files to each other using a mathematical operation of your choice (e.g. log2ratio, difference)
[computeMatrix][] visualization 1 bigWig, 1 BED zipped file, to be used with heatmapper or profiler compute the values needed for heatmaps and summary plots
[heatmapper][] visualization computeMatrix output heatmap of read coverages visualize the read coverages for genomic regions
[profiler][] visualization computeMatrix output summary plot ("meta-profile") visualize the average read coverages over a group of genomic regions

Clone this wiki locally