Causal inference methods for -omics research
Causomic is a Python package designed to perform causal inference using different types of -omics data, including proteomics, transcriptomics, metabolomics, phosphoproteomics, ect. The primary goal is to predict the effects of interventions (e.g., drug treatments, protein inhibitions) on biological systems by leveraging causal modeling techniques and protein interaction networks.
- Overview
- Features
- Installation
- Quick Start
- Data Requirements
- Main Components
- Documentation
- Contributing
- Citation
- License
A fundamental challenge in biological experimentation is understanding how interventions (e.g., drug treatments, protein inhibitions) affect complex biological systems. Traditional machine learning approaches, particularly black box models, attempt to predict these effects without explicitly modeling the underlying causal relationships. This can be problematic when explainability is crucial (e.g., identifying disease-driving pathways) or when models incorrectly infer that downstream proteins causally influence upstream targets. Causomic addresses these limitations by:
- Integrating prior biological knowledge from biological network databases
- Building causal graphs that represent protein relationships
- Training deep probabilistic models with variational Bayesian inference (Pyro/PyTorch)
- Predicting intervention effects on downstream proteins
The package is particularly useful for:
- Drug discovery and target identification
- Understanding protein pathway dynamics
- Predicting off-target effects of interventions
- Analyzing perturbation experiments in proteomics
- Integration with INDRA (Integrated Network and Dynamical Reasoning Assembler)
- Automatic extraction of protein interaction networks
- Support for GSEA-driven pathway analysis
- Posterior network estimation using PKN and experimental data
- Bayesian probabilistic models using Pyro
- Latent variable models for handling missing data
- Support for both observational and interventional data
- Uncertainty quantification for predictions
- Predict effects of protein inhibitions
- Estimate downstream pathway responses
- Quantify prediction uncertainty
- Validation against experimental data
- Integration with proteomics (MSstats) output format
- Data normalization and preprocessing utilities
- Handling of protein-level summarized data
- Generate example graphs which exhibit different causal structures
- Simulate data over causal graphs using real world data generating processes
- Leverage simulations for method validations
- Python 3.9 or higher
- PyTorch
- Pyro-PPL
git clone https://github.com/devonjkohler/Causomic.git
cd Causomic
pip install -e .The main dependencies include:
pyro-ppl==1.8.5- Probabilistic programmingtorch- Deep learning frameworknetworkx- Graph manipulationpandas- Data manipulationnumpy- Numerical computingmatplotlib- Plottingy0- Network analysis utilities
from causomic.data_analysis.proteomics_data_processor import dataProcess
from causomic.simulation.example_graphs import mediator
from causomic.simulation.proteomics_simulator import simulate_data
# 1. Load your data (we use simulation)
med_graph = mediator(add_independent_nodes=False, output_node=False)
simulated_data = simulate_data(
med_graph['Networkx'],
coefficients=med_graph['Coefficients'],
add_error=False,
mnar_missing_param=[-3, 0.4], # Missing not at random
add_feature_var=True,
n=100,
seed=2
)
# 2. Preprocess data (assuming MS proteomics data)
input_data = dataProcess(
simulated_data["Feature_data"],
normalization=False,
summarization_method="TMP",
MBimpute=False,
sim_data=True
)
# 4. Fit causal model
from causomic.causal_model.LVM import LVM
lvm = LVM(backend="pyro", num_steps=2000, verbose=True)
lvm.fit(input_data, med_graph["causomic"])
model = ProteomicPerturbationModel(
n_obs=len(data),
root_nodes=['target_protein'],
downstream_nodes=['downstream_protein1', 'downstream_protein2']
)
# 5. Make predictions
intervention_value = 7.0
lvm.intervention({"X": intervention_value}, "Z")Causomic expects data in different formats depending on where in the pipeline you start. The causal model and graph construction expects data in wide-format with genes as the columns, samples as the rows, and values being quantitative experimental values.
If you are using MS-based proteomics data, we recommend running the data through
the MSstats pipeline through the dataProcess function. Then you can input the
ProteinLevelData object directly into Causomic.
Implementation of dataProcess directly into Causomic is ongoing.
- Data normalization and preprocessing
- Statistical utilities for proteomics data
- Integration with MSstats workflows
- INDRA network queries and processing
- Protein interaction network building
- Graph filtering and validation utilities
- Probabilistic models for causal inference
- Bayesian parameter estimation
- Intervention effect prediction
- Latent variable models for missing data
- Synthetic data generation for testing
- Model validation utilities
- Simulation studies for method development
The primary documentation is available as a Jupyter notebook:
- User Manual - Complete workflow and API reference
Detailed API documentation is available in the source code docstrings. Key modules:
causomic.causal_model.models- Core causal modelscausomic.graph_construction.utils- Network utilitiescausomic.data_analysis.normalization- Data preprocessingcausomic.simulation- Synthetic data generation
We welcome contributions! Please see our contributing guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
git clone https://github.com/devonjkohler/Causomic.git
cd Causomic
pip install -e ".[dev]"We use Black for code formatting and isort for import sorting:
black src/
isort src/If you use Causomic in your research, please cite:
@software{kohler2024causomic,
title={Causomic: Causal inference methods for -omics research},
author={Kohler, Devon},
year={2024},
url={https://github.com/devonjkohler/Causomic},
version={0.0.1-dev}
}This project is licensed under the MIT License - see the LICENSE file for details.
- Author: Devon Kohler
- Email: kohler.d@northeastern.edu
- Institution: Northeastern University
- GitHub: @devonjkohler