Release v1.0.0 - first stable version with complete docs · epigen/rnaseq_pipeline

Features

Provides an end-to-end Snakemake workflow for RNA-seq analysis, from unmapped BAM (uBAM) files to gene counts and annotations.
Performs adapter/quality trimming using fastp.
Aligns reads and quantifies gene expression using STAR (--quantMode GeneCounts), handling various library strandedness types.
Generates a gene-by-sample count matrix (counts.csv).
Retrieves gene annotations (ID, symbol, biotype, description) from Ensembl using biomaRt.
Calculates exon-based GC content and cumulative exon length for each gene, suitable for downstream bias correction (e.g., with CQN).
Creates gene (gene_annotation.csv) and sample (sample_annotation.csv) annotation files ready for downstream analysis.
Integrates QC metrics from fastp, STAR, and RSeQC into a comprehensive MultiQC report.
Designed as a compatible MrBiomics module.
Includes input validation for read types specified in annotation vs. BAM file flags.
Uses data streaming and temporary files for improved efficiency and reduced disk usage.
Manages software dependencies using Conda environments.

Documentation & Usage

Includes a README.md with workflow overview, features, usage instructions, QC guidelines, and a template Methods section.
Provides a CITATION.cff file for standardized citation.
Configuration is handled via config.yaml and annotation.csv, with explanations in config/README.md.
Example configuration and annotation files are included.
Exports Conda environment specifications (envs/*.yaml) used in the run for reproducibility.

We gratefully acknowledges adaptations from the snakemake-workflows/rna-seq-star-deseq2 workflow.

Full Changelog: https://github.com/epigen/rnaseq_pipeline/commits/v1.0.0

Provide feedback