This repository contains code and data to accompany the paper Network-based representation learning reveals the impact of age and diet on the gut microbial and metabolomic environment of U.S. infants in a randomized controlled feeding trial doi.org/10.1101/2024.11.01.621627. This includes preprocessing the original microbial and metabolomic count data, creating a sample X feature edge list where the edge weight between two nodes is their normalized count value, creating node2vec+ embeddings, selecting embedding spaces, and using embeddings to train diet and time point classifiers.
All python package dependencies may be installed using conda. If you do not already have conda installed see here for installation instruction.
Then run the following:
git clone git@github.com:krishnanlab/multiomics-embedding.git
cd multiomics-embedding
conda env create -f environment.yml
For ease of use, run/
contains shell scripts that call the python code in src/
.
All run scripts should be invoked from the project root.
Each script’s header includes its required arguments and flags.
Run bash scripts/<name>.sh --help
for details.
run_initial_sweep.sh
was used to evaluate node2vec+ embedding parameters effect on time point classifiers.
run_joint_sweep.sh
was used to evaluate embedding parameters on both time point and diet classifiers.
run_all.sh
was used to compare all unique embedding spaces generated during the two sweeps.
run_baseline.sh
was used to train logistic regression models using the processed -omics counts directly as features.
run_deployment.sh
was used to train logistic regression models using embedding features and find -omics features that are predicted to be associted with a diet or time point phenotype.
├── data/ # raw and processed data
├── notebooks/ # exploratory analysis
├── src/ # main code
├── results/ # all results for top performing embedding spaces
├── run/ # shell scripts to call run code
├── environment.yml # conda environment
In this repository we only include data and results for our top performing embedding spaces which were used in the paper. The performance of other embedding spaces can be seen in our public wandb project. Variation of all models is explored in src/2024-12-13_model_variance.ipynb
This repository and all its contents are released under the BSD 3-Clause License; See LICENSE
Adelle Price, Sakaiza Rasolofomanana-Rajery, Keenan Manpearl, Charles E. Robertson, Nancy F. Krebs, Daniel N. Frank, Arjun Krishnan*, Audrey E. Hendricks*, Minghua Tang*
*These authors contributed equally.
NIH (NIDDK) 1K01DK111665-01, 1R01DK126710, the Beef Checkoff through the National Cattlemen’s Beef Association, and the National Pork Board.