A python implementation of our manuscript "Predicting evolutionary dynamics in microbial communities using deep learning”.
This work introduces a unified, self-supervised pretraining framework to model the complex dynamics of evolutionary microbial communities. Evolution unfolds across individual (species traits), community (species interactions), and generational (temporal sequences) scales, creating high-dimensional and stochastic processes that are challenging to analyze.
This repository contains the source code for preprocessing data and reproducing experimental results. The source code is implemented in Python.
-
Install Anaconda https://docs.anaconda.com/anaconda/install/index.html
-
Create an experimental environment and install request packages: (this will take about 0.5 hours)
conda create --name ecopretrain python=3.9 conda activate ecopretrain pip install omegaconf tqdm scikit-learn torchdiffeq torch
-
The
requirements.txtfile contains the versions of all software packages -
Hardware requirements: 32GB RAM, intel i5-14600KF CPU, 16GB GPU (Nvidia 4060Ti)
- Landscape: The
Landscape/datafolder provides the original data, preprocessing scripts, and processed data for the three experimental adaptive landscape systems discussed in the manuscript:- E. coli toxin-antitoxin system
- Saccharomyces cerevisiae transfer RNA (tRNA) system
- Entacmaea quadricolor fluorescent protein system
- MacArthur: The
MacArthur/folder provides the simulation scripts and data preprocessing code for the resource competition model, adapted from the framework developed by Tikhonov et al. - Coli: The
Coli/datafolder provides the original data, preprocessing scripts, and processed data for the real-world E. coli laboratory evolution system with multi-antibiotic resistance.
This repository provides the source code for all experiments in the main text and supplementary materials. The specific scripts are organized as follows:
Landscape/: Contains the experimental scripts to reproduce Figures 2, 3, and 4 of the main text and Supplementary Figure 2.MacArthur/: Contains the experimental scripts to reproduce Figure 5 of the main text and Supplementary Figure 1.Coli/: Contains the experimental scripts to reproduce Figure 6 of the main text.
Taking the E. Coli experiment (Figure 6) as an example, the Coli/data folder contains the experimental data and evolutionary trajectories obtained from publicly available data. Readers can reproduce the model by running the interactive code files: (this will take about 0.5 hours)
coli_Encode.ipynb: Perform multiscale encodingcoli_Evolution.ipynb: Perform evolutionary predictioncoli_Fitness.ipynb: Perform fitness prediction
Each file contains the expected output and its visualization. If you want to run on your personal data, please replace the file in the corresponding folder.
The real-world and simulated datasets used in this paper were collected and adapted from the following open-source studies and projects:
- CCU: Saccharomyces cerevisiae tRNA system
- eqFP611: Entacmaea quadricolor fluorescent protein system
- ParD3: E. coli toxin-antitoxin system
- MacArthur: Resource-competition model
- E. Coli: Multi-antibiotic resistance E. coli evolution experiment.
This repo is covered under the MIT License.
