Skip to content

Releases: epigen/rnaseq_pipeline

v1.1.1 - minor fix

12 Jun 14:01
Compare
Choose a tag to compare

v1.1.0 - minor improvements

28 May 15:35
Compare
Choose a tag to compare

Features

  • Exon-based gene annotation without BioMart using Biostars gene-annotation script (https://www.biostars.org/p/91218)
  • STAR aligner defaults set to ENCODE recommendations in config
  • Sample-annotation sanitization for R compatibility

Documentation

  • QC guideline enhancements in README (updated thresholds & descriptions)

Comprehensive testing

  • Sequencing modes: single-end & paired-end
  • Species: mouse & human
  • Sample sources: primary tissues & cell lines
  • Library protocols: Smart-seq2 (unstranded), NEBNext Ultra (unstranded), Quant-seq (stranded)
  • Cell types: Structural (epithelial, fibroblast, endothelial), Hematopoietic (13 cell types), CAR-T cells, sorted primary immune subsets, neurons

Full Changelog: v1.0.0...v1.1.0

v1.0.0 - first stable version with complete docs

01 Apr 11:37
Compare
Choose a tag to compare

Features

  • Provides an end-to-end Snakemake workflow for RNA-seq analysis, from unmapped BAM (uBAM) files to gene counts and annotations.
  • Performs adapter/quality trimming using fastp.
  • Aligns reads and quantifies gene expression using STAR (--quantMode GeneCounts), handling various library strandedness types.
  • Generates a gene-by-sample count matrix (counts.csv).
  • Retrieves gene annotations (ID, symbol, biotype, description) from Ensembl using biomaRt.
  • Calculates exon-based GC content and cumulative exon length for each gene, suitable for downstream bias correction (e.g., with CQN).
  • Creates gene (gene_annotation.csv) and sample (sample_annotation.csv) annotation files ready for downstream analysis.
  • Integrates QC metrics from fastp, STAR, and RSeQC into a comprehensive MultiQC report.
  • Designed as a compatible MrBiomics module.
  • Includes input validation for read types specified in annotation vs. BAM file flags.
  • Uses data streaming and temporary files for improved efficiency and reduced disk usage.
  • Manages software dependencies using Conda environments.

Documentation & Usage

  • Includes a README.md with workflow overview, features, usage instructions, QC guidelines, and a template Methods section.
  • Provides a CITATION.cff file for standardized citation.
  • Configuration is handled via config.yaml and annotation.csv, with explanations in config/README.md.
  • Example configuration and annotation files are included.
  • Exports Conda environment specifications (envs/*.yaml) used in the run for reproducibility.

We gratefully acknowledges adaptations from the snakemake-workflows/rna-seq-star-deseq2 workflow.

Full Changelog: https://github.com/epigen/rnaseq_pipeline/commits/v1.0.0