Skip to content

talhasaleemm/DEGA-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DEGA — Differential Expression & Gene Analysis

DEGA is a reproducible pipeline and interactive Jupyter notebook for performing differential gene expression (DGE) analysis on gene expression datasets. The repository includes a fully documented notebook (DEGA.ipynb) present in colab folder that runs the analysis, plus an output folder containing publication-ready tables, plots, and summaries generated by the notebook.


Latest updates

  • 2025-08-05 — Analysis run and outputs exported. This release includes:

    • publication_ready_results.csv and comprehensive_deg_results.csv.
    • Figures: volcano_plot.png, ma_plot.png, top_genes_heatmap.png, top_genes_boxplots.png, exploratory_analysis.png, quality_assessment.png, and more.
    • statistical_summary.txt summarizing the key metrics from the run.

Project summary

DEGA is intended for researchers who want a clear, reproducible workflow to go from raw or pre-processed expression tables to:

  • Quality assessment & exploratory data analysis (PCA, clustering, QC plots)
  • Differential expression testing (fold-change, adjusted p-values)
  • Visualization (volcano, MA, heatmaps, boxplots)
  • Export of publication-ready tables

The repository contain notebooks that performs the entire analysis and writes output files which can be seen in a outputs folder.

--

Installation

The notebook requires a standard Python stack. The first cell installs and imports the dependencies used in the analysis. Recommended to create an isolated environment:

# create environment (conda recommended)
conda create -n dega python=3.10 -y
conda activate dega

# install core packages
pip install --upgrade pip
pip install jupyterlab geoquery GEOparse pandas numpy scipy matplotlib seaborn scikit-learn rpy2

# optional extras used by the notebook for exporting/figures
pip install openpyxl xlsxwriter plotly kaleido adjustText

The notebook's first cell includes pip install statements so it can be run in a fresh Colab/Binder session as well.


One-line (Colab)

Open the notebook in Google Colab (or run locally) — the setup cell will install dependencies automatically.


Notebook structure & recommended run order

DEGA.ipynb is divided into the following high-level sections (run in this order):

  1. Install and import libraries — ensures all Python/R dependencies are available.
  2. Load data & sample — load expression matrices and the sample metadata file (or download via GEO if configured).
  3. Preprocessing & filtering — low-expression filtering and optional normalization steps.
  4. Exploratory data analysis — PCA, sample QC, sample clustering, QC plots.
  5. Differential expression testing — statistical tests, p-value adjustment, fold-change calculation.
  6. Post-processing & filtering — select significant genes by p-value and log2 fold-change thresholds.
  7. Visualization — volcano plot, MA-plot, heatmaps, boxplots for top genes.
  8. Export results — write comprehensive_deg_results.csv, publication_ready_results.csv, figures, and a statistical_summary.txt.

Notes:

  • The notebook defines threshold variables (e.g. p_threshold, log2fc_threshold) near the DGE section — adjust them before running the visualization cells.
  • The notebook prints progress and places outputs in the local working directory (see deg_analysis_output zip for an example layout).

Outputs included (example files from deg_analysis_output.zip)

  • comprehensive_deg_results.csv — full results table containing expression means, standard deviations, log2 fold-change, raw and adjusted p-values, and flags for significance/regulation.
  • publication_ready_results.csv — curated table ready for inclusion in papers/supplementary material.
  • supplementary_all_genes_analysis.csv — additional summary metrics for all genes.
  • expression_filtered.csv — filtered expression matrix used for downstream analysis.
  • statistical_summary.txt — short text summary (date of analysis, number of genes analyzed, counts of up/downregulated genes, etc.).
  • Figures: volcano_plot.png, ma_plot.png, top_genes_heatmap.png, top_genes_boxplots.png, exploratory_analysis.png, quality_assessment.png, expression_clusters.png, treatment_effect_preview.png, comprehensive_treatment_validation.png.
exploratory_analysis

Representative results (from the latest run)

The latest statistical_summary.txt (analysis date: 2025-08-05) reports:

  • Total genes analyzed: 1000
  • Significant genes: 1 (Percent significant: 0.10%)
  • Upregulated genes: 54
  • Downregulated genes: 77
  • Mean fold change (significant): 2.04

See deg_analysis_output/statistical_summary.txt for the full summary and top gene lists.


The notebook also demonstrates how to regenerate figures and adjust significance thresholds.


File structure

repo-root/
├── colab                  # Main analysis notebook
├── notebooks              # each cell in a separate file
├── outputs                # expected results(exported CSVs and figures)
├── requirements.txt       # rquirements to use the repo
└── README.md              # This file

Reproducibility & environment

  • The notebook tries to install exact Python packages at runtime (see the first cell).
  • For full reproducibility, record the output of pip freeze or export the conda environment before running the analysis.
  • If results are to be used in publications, set random seeds and record software versions used (the notebook prints the analysis date in statistical_summary.txt).

Contributing

Contributions and issues are welcome. Please open an issue describing the request or submit a pull request with tests and updated notebook outputs where appropriate. Suggested improvements:

  • Add a command-line wrapper to run the pipeline headlessly.
  • Add unit tests for core pre-processing functions.
  • Add support for common normalization methods (DESeq2 via rpy2, limma-voom, edgeR).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published