Releases · epigen/rnaseq_pipeline

Features

Exon-based gene annotation without BioMart using Biostars gene-annotation script (https://www.biostars.org/p/91218)
STAR aligner defaults set to ENCODE recommendations in config
Sample-annotation sanitization for R compatibility

Documentation

Comprehensive testing

Sequencing modes: single-end & paired-end
Species: mouse & human
Sample sources: primary tissues & cell lines
Library protocols: Smart-seq2 (unstranded), NEBNext Ultra (unstranded), Quant-seq (stranded)
Cell types: Structural (epithelial, fibroblast, endothelial), Hematopoietic (13 cell types), CAR-T cells, sorted primary immune subsets, neurons

Full Changelog: v1.0.0...v1.1.0

Features

Provides an end-to-end Snakemake workflow for RNA-seq analysis, from unmapped BAM (uBAM) files to gene counts and annotations.
Performs adapter/quality trimming using fastp.
Aligns reads and quantifies gene expression using STAR (--quantMode GeneCounts), handling various library strandedness types.
Generates a gene-by-sample count matrix (counts.csv).
Retrieves gene annotations (ID, symbol, biotype, description) from Ensembl using biomaRt.
Calculates exon-based GC content and cumulative exon length for each gene, suitable for downstream bias correction (e.g., with CQN).
Creates gene (gene_annotation.csv) and sample (sample_annotation.csv) annotation files ready for downstream analysis.
Integrates QC metrics from fastp, STAR, and RSeQC into a comprehensive MultiQC report.
Designed as a compatible MrBiomics module.
Includes input validation for read types specified in annotation vs. BAM file flags.
Uses data streaming and temporary files for improved efficiency and reduced disk usage.
Manages software dependencies using Conda environments.

Documentation & Usage

Includes a README.md with workflow overview, features, usage instructions, QC guidelines, and a template Methods section.
Provides a CITATION.cff file for standardized citation.
Configuration is handled via config.yaml and annotation.csv, with explanations in config/README.md.
Example configuration and annotation files are included.
Exports Conda environment specifications (envs/*.yaml) used in the run for reproducibility.

We gratefully acknowledges adaptations from the snakemake-workflows/rna-seq-star-deseq2 workflow.

Full Changelog: https://github.com/epigen/rnaseq_pipeline/commits/v1.0.0

Provide feedback