Skip to content

v1.0.0 - first stable version with complete docs

Compare
Choose a tag to compare
@sreichl sreichl released this 01 Apr 11:37
· 5 commits to main since this release

Features

  • Provides an end-to-end Snakemake workflow for RNA-seq analysis, from unmapped BAM (uBAM) files to gene counts and annotations.
  • Performs adapter/quality trimming using fastp.
  • Aligns reads and quantifies gene expression using STAR (--quantMode GeneCounts), handling various library strandedness types.
  • Generates a gene-by-sample count matrix (counts.csv).
  • Retrieves gene annotations (ID, symbol, biotype, description) from Ensembl using biomaRt.
  • Calculates exon-based GC content and cumulative exon length for each gene, suitable for downstream bias correction (e.g., with CQN).
  • Creates gene (gene_annotation.csv) and sample (sample_annotation.csv) annotation files ready for downstream analysis.
  • Integrates QC metrics from fastp, STAR, and RSeQC into a comprehensive MultiQC report.
  • Designed as a compatible MrBiomics module.
  • Includes input validation for read types specified in annotation vs. BAM file flags.
  • Uses data streaming and temporary files for improved efficiency and reduced disk usage.
  • Manages software dependencies using Conda environments.

Documentation & Usage

  • Includes a README.md with workflow overview, features, usage instructions, QC guidelines, and a template Methods section.
  • Provides a CITATION.cff file for standardized citation.
  • Configuration is handled via config.yaml and annotation.csv, with explanations in config/README.md.
  • Example configuration and annotation files are included.
  • Exports Conda environment specifications (envs/*.yaml) used in the run for reproducibility.

We gratefully acknowledges adaptations from the snakemake-workflows/rna-seq-star-deseq2 workflow.

Full Changelog: https://github.com/epigen/rnaseq_pipeline/commits/v1.0.0