PatchSeq-Epilepsy: Multimodal Neuronal Excitability Analysis

Overview

This repository documents my research experience analyzing neuronal excitability using patch-clamp electrophysiology and transcriptomics. I'm building computational tools to understand how gene expression patterns relate to electrical properties of neurons, with a focus on epilepsy-relevant phenotypes.

Research Context

Scientific Question

How do gene-expression signatures predict neuronal excitability phenotypes?

Neurons from patients with epilepsy often exhibit abnormal firing properties. However, the molecular mechanisms linking transcriptional programs to hyperexcitability remain unclear. This project uses multimodal single-cell data to:

Identify transcriptomic features that correlate with electrophysiological traits
Build interpretable predictive models to understand genotype-phenotype relationships
Guide future functional validation and therapeutic target discovery

Data Source

This work leverages the Allen Institute Patch-seq Dataset, which provides simultaneous measurements of:

Transcriptomics: Single-cell RNA-seq (gene expression across ~20,000 genes)
Electrophysiology: Patch-clamp recordings (firing rate, rheobase, input resistance, etc.)
Morphology & Metadata: Cell type classification, cortical layer, species

What This Repository Contains

Core Workflow

The analysis pipeline is structured around Snakemake, a workflow management system that ensures reproducibility:

data/raw/                   ← Raw Allen Patch-seq datasets (transcriptomics, ephys, metadata)
    ├── patchseq_transcriptomics.csv    [~5 GB]
    ├── patchseq_metadata.csv
    └── patchseq_ephys_features.csv

src/patchseq_pipeline/      ← Main Python package
    ├── data/                 (Loading & preprocessing utilities)
    ├── analysis/             (Feature selection & dimensionality reduction)
    ├── models/               (Ridge regression, elastic net models)
    └── viz/                  (Plotting & figure generation)

scripts/                    ← Standalone analysis scripts
    ├── download_data.py          (Fetch data via AllenSDK)
    ├── build_features.py         (Normalize & standardize features)
    ├── train_model.py            (Fit predictive models)
    └── generate_figures.py       (Create plots)

results/                    ← Generated outputs (after running pipeline)
    ├── figures/              (PNG plots panels)
    ├── models/               (Trained model checkpoints & metrics)
    └── logs/                 (Execution logs & debugging info)

Key Scripts

Script	Purpose	Status
`download_data.py`	Fetch Patch-seq data from Allen Institute	Core
`build_features.py`	Quality control, normalization, feature engineering	Core
`train_model.py`	Fit Ridge/Elastic Net models, cross-validation	Core
`generate_figures.py`	Produce analysis visualizations	Core
`checksums.py`	Data integrity verification	Utility

Methods (At a Glance)

1. Data Preprocessing

Load raw gene counts and electrophysiology recordings
Filter cells by QC metrics (library size, gene counts, mitochondrial content)
Log-normalize gene counts: $\log(x + 1)$
Identify and remove outliers

2. Dimensionality Reduction

Compute variance for each gene; select top N by expression variance
Apply PCA to reduce noise and improve model generalization
(Optional) UMAP visualization for interactive exploration

3. Predictive Modeling

Target Variables: Firing rate, rheobase, input resistance
Model Choice: Ridge Regression ($\ell_2$ regularization)
Training: 80/20 split with 5-fold cross-validation
Metrics: R², MSE, feature importance

4. Biological Interpretation

Extract top predictive genes (highest absolute model coefficients)
Map genes to known epilepsy risk loci (future: pathway enrichment)
Validate findings against literature

Quickstart

Prerequisites

Python 3.9+ (recommended: 3.11)
Mamba or Conda (for environment management)
~50 GB disk space (for full dataset)

Example Results

After running the pipeline, you'll get:

Feature Importance Plot – Top genes predicting firing rate
PCA Visualization – Transcriptome structure by electrophysiology phenotype
Model Performance – R² scores, residual distributions
Gene Lists – Ranked by predictive importance for neuroinflammatory follow-up

Contact & license

MIT License — see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
scripts		scripts
src/patchseq_pipeline		src/patchseq_pipeline
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PatchSeq-Epilepsy: Multimodal Neuronal Excitability Analysis

Overview

Research Context

Scientific Question

Data Source

What This Repository Contains

Core Workflow

Key Scripts

Methods (At a Glance)

1. Data Preprocessing

2. Dimensionality Reduction

3. Predictive Modeling

4. Biological Interpretation

Quickstart

Prerequisites

Example Results

About

Uh oh!

Languages

License

Prokash21/Celltypes-Patchseq-Epilepsy

Folders and files

Latest commit

History

Repository files navigation

PatchSeq-Epilepsy: Multimodal Neuronal Excitability Analysis

Overview

Research Context

Scientific Question

Data Source

What This Repository Contains

Core Workflow

Key Scripts

Methods (At a Glance)

1. Data Preprocessing

2. Dimensionality Reduction

3. Predictive Modeling

4. Biological Interpretation

Quickstart

Prerequisites

Example Results

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages