Author: Eren Ada, PhD
Date: 6/3/2025
Current Version: 2.0.0
This project develops a comprehensive and modular bulk RNA-sequencing (RNA-seq) analysis tool using RShiny. The suite consists of interconnected RShiny applications, each dedicated to a specific stage of the RNA-seq analysis workflow.
The first module is fully operational and provides comprehensive quality control and pre-processing capabilities for bulk RNA-seq count data.
See the Screenshots & Interface Overview section below for a visual tour of the application's features and workflow.
Status: Complete and Production-Ready
Location: app.R
(main application)
- Data Input & Validation: Flexible CSV upload with comprehensive validation
- Quality Control: Library size analysis, expression distributions, sample correlations, PCA
- Filtering & Normalization: Multiple strategies for gene filtering and data normalization
- Export Capabilities: Processed data and publication-ready visualizations
# Launch the application
shiny::runApp("app.R")
For detailed usage instructions, see the Quick Start Guide.
Feature | Screenshot | Description |
---|---|---|
Data Import | Validation | Smart CSV/TSV import with comprehensive validation |
Quality Control | PCA Analysis | Interactive 2D/3D PCA with sample grouping |
Correlation Analysis | Sample Correlation | Comprehensive correlation heatmaps |
Gene Filtering | Filtering | Advanced filtering with real-time impact visualization |
Normalization | Normalization | Multiple methods with evaluation metrics |
Documentation | About Section | Tool information and comprehensive resource access |
- Differential Gene Expression (DEG) Analysis Tool (In Development)
- Pathway & Functional Enrichment Analysis Tool (Planned)
- Advanced Visualization Tool (Planned)
- Complete User Manual - Comprehensive guide to all features
- Quick Start Guide - 15-minute walkthrough
- Technical Requirements - System requirements and setup
- Troubleshooting Guide - Common issues and solutions
- Developer Documentation - Architecture and development guidelines
- Testing Framework - Testing guidelines and examples
- Project Status - Current development status
- R Version: 4.0.0 or higher (4.3.0+ recommended)
- Operating System: Windows 10+, macOS 10.14+, or Linux (Ubuntu 18.04+)
- RAM: 4GB minimum (8GB recommended)
- Browser: Chrome, Firefox, Safari, or Edge (latest versions)
The application automatically installs required packages on first launch:
# Navigate to the APP directory
setwd("path/to/APP")
# Launch application (packages install automatically)
shiny::runApp("app.R")
For development or troubleshooting:
# Install package manager
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
# Install core dependencies
BiocManager::install(c("DESeq2", "edgeR", "limma"))
install.packages(c("shiny", "DT", "plotly", "ggplot2"))
Explore the application's intuitive interface and comprehensive analysis capabilities through these screenshots:
Smart data import with comprehensive validation
- Automatic format detection (CSV/TSV)
- Real-time validation feedback with detailed requirements
- Sample-metadata consistency checking
- Missing data identification and handling
The validation interface shows data summary, validation status, and library size distribution for quality assessment.
Interactive PCA visualization for sample relationship assessment
- 2D and 3D PCA plots with customizable components
- Sample grouping by experimental factors
- Outlier detection and identification
- Variance contribution statistics
PCA plots reveal sample clustering patterns and help identify potential batch effects or outliers.
Comprehensive correlation heatmaps for sample quality assessment
- Sample-to-sample correlation matrices
- Customizable correlation methods (Pearson, Spearman)
- Interactive filtering by experimental groups
- Visual correlation strength indicators
Correlation heatmaps help identify sample relationships and potential technical issues.
Advanced filtering with real-time impact visualization
- Multiple filtering strategies (count-based, CPM-based, group-aware)
- Before/after comparison statistics
- Interactive threshold adjustment
- Gene retention summaries and distribution plots
The filtering interface provides comprehensive options with immediate visual feedback on data impact.
Robust normalization with comprehensive evaluation metrics
- Multiple normalization methods (CPM, TMM, VST, RLE, etc.)
- Mean-variance relationship assessment
- Library size evaluation across methods
- Normalization effectiveness statistics
Normalization evaluation helps choose the most appropriate method for your data characteristics.
Comprehensive tool information and resource access
- Detailed tool description and institutional context
- System information and session status monitoring
- Quick access to documentation and support resources
- Professional presentation with licensing information
The About section provides comprehensive tool information, system status, and easy access to documentation.
- Data Upload: Load count matrix and metadata (CSV format)
- Validation: Review data validation results and handle any issues
- Quality Control: Examine QC plots and assess data quality
- Processing: Apply filtering and normalization based on QC results
- Export: Download processed data and reports
Sample datasets are provided in the example_data/
directory for testing and learning:
example_counts.csv
- Anonymized count matrix (5,001 genes × 24 samples)example_metadata.csv
- Corresponding sample metadata with experimental design
The example data represents a multi-factorial experimental design with:
- 2 Strains (StrainA, StrainB)
- 2 Treatments (Control, TreatmentX)
- 3 Tissue types (Tissue1, Tissue2, Tissue3)
- 2 biological replicates per condition
- Flexible CSV import with configurable delimiters
- Automatic duplicate gene detection and handling
- Sample consistency validation between count matrix and metadata
- Support for various gene ID formats (Ensembl, Gene Symbol, etc.)
- Library size distribution analysis
- Gene expression distribution plots
- Sample-to-sample correlation heatmaps
- Interactive 2D and 3D PCA plots
- Comprehensive QC summary reports
- Filtering Options: Count-based, CPM-based, group-aware filtering
- Normalization Methods: CPM, TMM, RLE, VST, RLOG
- Real-time impact visualization
- Before/after comparison plots
- Multiple export formats (CSV, RDS, PDF)
- Publication-ready plots
- Comprehensive analysis reports
- Compatible with downstream DEG analysis tools
- Module 1: QC & Pre-processing Tool (v2.0.0)
- Comprehensive documentation system
- Testing framework foundation
- Module 2: Differential Expression Analysis Tool
- Enhanced visualization options
- Performance optimizations
- Module 3: Pathway & Functional Enrichment Analysis
- Module 4: Advanced Visualization Tool
- Integration between modules
- Batch processing capabilities
We welcome contributions! Please see our Developer Documentation for:
- Development setup and guidelines
- Code style and standards
- Testing requirements
- Pull request process
- Fork the repository
- Create a feature branch
- Follow coding standards in Developer Guide
- Add tests for new functionality
- Update documentation
- Submit pull request
- Check the Troubleshooting Guide
- Review Technical Requirements
- Try with example data to isolate issues
- Create an issue with detailed information
When reporting bugs or requesting features:
- Use the issue templates
- Include system information (R version, OS, browser)
- Provide reproducible examples
- Specify dataset characteristics if relevant
This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License - see the LICENSE file for details.
If you use this tool in your research, please cite:
@software{ada2025rna,
author = {Ada, Eren},
title = {Modular Bulk RNA-seq Analysis RShiny Tool Suite},
year = {2025},
version = {2.0.0},
url = {https://github.com/hms-immunology/RNA_QC_APP}
}
This project builds upon the excellent work of the Bioconductor community, particularly the DESeq2, edgeR, and limma packages for RNA-seq analysis.
Current Focus: The QC & Pre-processing Tool (Module 1) is complete and ready for production use. Development is now focused on Module 2 (Differential Expression Analysis) while maintaining and improving Module 1 based on user feedback.