Releases: epigen/rnaseq_pipeline
Releases · epigen/rnaseq_pipeline
v1.1.1 - minor fix
Full Changelog: v1.1.0...v1.1.1
v1.1.0 - minor improvements
Features
- Exon-based gene annotation without BioMart using Biostars gene-annotation script (https://www.biostars.org/p/91218)
- STAR aligner defaults set to ENCODE recommendations in
config
- Sample-annotation sanitization for R compatibility
Documentation
- QC guideline enhancements in README (updated thresholds & descriptions)
Comprehensive testing
- Sequencing modes: single-end & paired-end
- Species: mouse & human
- Sample sources: primary tissues & cell lines
- Library protocols: Smart-seq2 (unstranded), NEBNext Ultra (unstranded), Quant-seq (stranded)
- Cell types: Structural (epithelial, fibroblast, endothelial), Hematopoietic (13 cell types), CAR-T cells, sorted primary immune subsets, neurons
Full Changelog: v1.0.0...v1.1.0
v1.0.0 - first stable version with complete docs
Features
- Provides an end-to-end Snakemake workflow for RNA-seq analysis, from unmapped BAM (uBAM) files to gene counts and annotations.
- Performs adapter/quality trimming using
fastp
. - Aligns reads and quantifies gene expression using STAR (
--quantMode GeneCounts
), handling various library strandedness types. - Generates a gene-by-sample count matrix (
counts.csv
). - Retrieves gene annotations (ID, symbol, biotype, description) from Ensembl using
biomaRt
. - Calculates exon-based GC content and cumulative exon length for each gene, suitable for downstream bias correction (e.g., with CQN).
- Creates gene (
gene_annotation.csv
) and sample (sample_annotation.csv
) annotation files ready for downstream analysis. - Integrates QC metrics from
fastp
,STAR
, andRSeQC
into a comprehensiveMultiQC
report. - Designed as a compatible MrBiomics module.
- Includes input validation for read types specified in annotation vs. BAM file flags.
- Uses data streaming and temporary files for improved efficiency and reduced disk usage.
- Manages software dependencies using Conda environments.
Documentation & Usage
- Includes a
README.md
with workflow overview, features, usage instructions, QC guidelines, and a template Methods section. - Provides a
CITATION.cff
file for standardized citation. - Configuration is handled via
config.yaml
andannotation.csv
, with explanations inconfig/README.md
. - Example configuration and annotation files are included.
- Exports Conda environment specifications (
envs/*.yaml
) used in the run for reproducibility.
We gratefully acknowledges adaptations from the snakemake-workflows/rna-seq-star-deseq2
workflow.
Full Changelog: https://github.com/epigen/rnaseq_pipeline/commits/v1.0.0