Skip to content

stjude/WARDEN

Repository files navigation

WARDEN

The WARDEN (Workflow for the Analysis of RNA-Seq Differential ExpressioN) software uses RNA-Seq sequence files to perform alignment, coverage analysis, gene counts, and differential expression analysis.

There are 3 entrypoints to the WARDEN workflow. The start-to-end workflow begins with FastQ files which are aligned by STAR. WARDEN can optionally be entered at this point with user-aligned RNA-Seq BAMs. Aligned BAMs are then run through HTSeq-count to determine the number of reads mapping to features. The next stage can also be entered with user-derived count files, where differential expression analysis is performed on the defined cohorts.

For the full usage documentation, visit the St. Jude Cloud Docs.

Workflow Steps

  1. FastQ files generated by RNA-Seq are mapped to a reference genome using the STAR.
  2. HTSeq-count is used to assign mapped reads to features (default feature is gene).
  3. Differential expression analysis is performed using VOOM normalization of counts and LIMMA analysis.
  4. Coverage plots of mapped reads are optionally generated as interactive visualizations.

Architecture

WARDEN's three entry points exist as their own apps in the directories stjude_warden_fastq, stjude_warden_bam, and stjude_warden_counts. Within each app's resources/app_data/internal_source/ directory are source code for dnanexus applets, which are dynamically built when running the main app. Those applets are linked together by the resources/usr/bin/create_workflow.py scripts to create a workflow, which is built and run by the main app.

There is a very large amount of code duplication between these 3 main directories because the dx build process can't handle symlinks or imports. CI has been built that will ensure that files that should be exact copies of each other are. There are weak points in this, in that there are large amounts of duplication in the create_workflow.py, warden.sh, and dxapp.json files which must be manually kept in sync. Similarly, the subapplet warden_genome_coverage_bed in stjude_warden_fastq and stjude_warden_bam are slightly different and also require manual maintenance. While developing for this repo, project-wide "find and replace" is your friend.

For the most part, stjude_warden_counts is a subset of the code in stjude_warden_bam, which in turn is a subset of the code in stjude_warden_fastq.

About

Source code for the WARDEN DNAnexus apps

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •