Skip to content

devonjkohler/Causomic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

64 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Causomic

Python Version License Development Status

Causal inference methods for -omics research

Causomic is a Python package designed to perform causal inference using different types of -omics data, including proteomics, transcriptomics, metabolomics, phosphoproteomics, ect. The primary goal is to predict the effects of interventions (e.g., drug treatments, protein inhibitions) on biological systems by leveraging causal modeling techniques and protein interaction networks.

Table of Contents

Overview

A fundamental challenge in biological experimentation is understanding how interventions (e.g., drug treatments, protein inhibitions) affect complex biological systems. Traditional machine learning approaches, particularly black box models, attempt to predict these effects without explicitly modeling the underlying causal relationships. This can be problematic when explainability is crucial (e.g., identifying disease-driving pathways) or when models incorrectly infer that downstream proteins causally influence upstream targets. Causomic addresses these limitations by:

  1. Integrating prior biological knowledge from biological network databases
  2. Building causal graphs that represent protein relationships
  3. Training deep probabilistic models with variational Bayesian inference (Pyro/PyTorch)
  4. Predicting intervention effects on downstream proteins

The package is particularly useful for:

  • Drug discovery and target identification
  • Understanding protein pathway dynamics
  • Predicting off-target effects of interventions
  • Analyzing perturbation experiments in proteomics

Features

🧬 Prior Knowledge Network (PKN) Construction

  • Integration with INDRA (Integrated Network and Dynamical Reasoning Assembler)
  • Automatic extraction of protein interaction networks
  • Support for GSEA-driven pathway analysis
  • Posterior network estimation using PKN and experimental data

πŸ“Š Causal Modeling

  • Bayesian probabilistic models using Pyro
  • Latent variable models for handling missing data
  • Support for both observational and interventional data
  • Uncertainty quantification for predictions

🎯 Intervention Prediction

  • Predict effects of protein inhibitions
  • Estimate downstream pathway responses
  • Quantify prediction uncertainty
  • Validation against experimental data

πŸ”¬ MS Data Processing

  • Integration with proteomics (MSstats) output format
  • Data normalization and preprocessing utilities
  • Handling of protein-level summarized data

Simulation

  • Generate example graphs which exhibit different causal structures
  • Simulate data over causal graphs using real world data generating processes
  • Leverage simulations for method validations

Installation

Prerequisites

  • Python 3.9 or higher
  • PyTorch
  • Pyro-PPL

Install from source

git clone https://github.com/devonjkohler/Causomic.git
cd Causomic
pip install -e .

Dependencies

The main dependencies include:

  • pyro-ppl==1.8.5 - Probabilistic programming
  • torch - Deep learning framework
  • networkx - Graph manipulation
  • pandas - Data manipulation
  • numpy - Numerical computing
  • matplotlib - Plotting
  • y0 - Network analysis utilities

Quick Start

from causomic.data_analysis.proteomics_data_processor import dataProcess

from causomic.simulation.example_graphs import mediator
from causomic.simulation.proteomics_simulator import simulate_data

# 1. Load your data (we use simulation)
med_graph = mediator(add_independent_nodes=False, output_node=False)
    
simulated_data = simulate_data(
      med_graph['Networkx'], 
      coefficients=med_graph['Coefficients'], 
      add_error=False,
      mnar_missing_param=[-3, 0.4],  # Missing not at random
      add_feature_var=True, 
      n=100, 
      seed=2
)

# 2. Preprocess data (assuming MS proteomics data)
input_data = dataProcess(
    simulated_data["Feature_data"], 
    normalization=False, 
    summarization_method="TMP", 
    MBimpute=False, 
    sim_data=True
)

# 4. Fit causal model
from causomic.causal_model.LVM import LVM

lvm = LVM(backend="pyro", num_steps=2000, verbose=True)
lvm.fit(input_data, med_graph["causomic"])

model = ProteomicPerturbationModel(
    n_obs=len(data),
    root_nodes=['target_protein'],
    downstream_nodes=['downstream_protein1', 'downstream_protein2']
)

# 5. Make predictions
intervention_value = 7.0
lvm.intervention({"X": intervention_value}, "Z")

Data Requirements

Input Data Format

Causomic expects data in different formats depending on where in the pipeline you start. The causal model and graph construction expects data in wide-format with genes as the columns, samples as the rows, and values being quantitative experimental values.

Preprocessing with MSstats (R)

If you are using MS-based proteomics data, we recommend running the data through the MSstats pipeline through the dataProcess function. Then you can input the ProteinLevelData object directly into Causomic.

Implementation of dataProcess directly into Causomic is ongoing.

Main Components

πŸ“ˆ Data Analysis (causomic.data_analysis)

  • Data normalization and preprocessing
  • Statistical utilities for proteomics data
  • Integration with MSstats workflows

πŸ•ΈοΈ Graph Construction (causomic.graph_construction)

  • INDRA network queries and processing
  • Protein interaction network building
  • Graph filtering and validation utilities

🎯 Causal Modeling (causomic.causal_model)

  • Probabilistic models for causal inference
  • Bayesian parameter estimation
  • Intervention effect prediction
  • Latent variable models for missing data

πŸ§ͺ Simulation (causomic.simulation)

  • Synthetic data generation for testing
  • Model validation utilities
  • Simulation studies for method development

Documentation

User Manual

The primary documentation is available as a Jupyter notebook:

API Reference

Detailed API documentation is available in the source code docstrings. Key modules:

  • causomic.causal_model.models - Core causal models
  • causomic.graph_construction.utils - Network utilities
  • causomic.data_analysis.normalization - Data preprocessing
  • causomic.simulation - Synthetic data generation

Contributing

We welcome contributions! Please see our contributing guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Setup

git clone https://github.com/devonjkohler/Causomic.git
cd Causomic
pip install -e ".[dev]"

Code Style

We use Black for code formatting and isort for import sorting:

black src/
isort src/

Citation

If you use Causomic in your research, please cite:

@software{kohler2024causomic,
  title={Causomic: Causal inference methods for -omics research},
  author={Kohler, Devon},
  year={2024},
  url={https://github.com/devonjkohler/Causomic},
  version={0.0.1-dev}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

Acknowledgments

  • INDRA - Integrated Network and Dynamical Reasoning Assembler
  • MSstats - Statistical tools for proteomics
  • Pyro - Probabilistic programming framework
  • NetworkX - Network analysis library

About

Causal inference methods for for -omics research

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •