A python implementation of our manuscript "Multiscale structure-encoded pretraining for modeling evolutionary microbial communities”.
This work introduces a unified, self-supervised pretraining framework to model the complex dynamics of evolutionary microbial communities. Evolution unfolds across individual (species traits), community (species interactions), and generational (temporal sequences) scales, creating high-dimensional and stochastic processes that are challenging to analyze.
This repository contains the source code for preprocessing data and reproducing experimental results. The source code is implemented in Python.
-
Install Anaconda https://docs.anaconda.com/anaconda/install/index.html
-
Create an experimental environment and install request packages: (this will take about 0.5 hours)
conda create --name ecopretrain python=3.9 conda activate ecopretrain pip install omegaconf tqdm scikit-learn torchdiffeq torch
- Landscape: The
Landscape/data
folder provides the original data, preprocessing scripts, and processed data for the three experimental adaptive landscape systems discussed in the manuscript:- E. coli toxin-antitoxin system
- Saccharomyces cerevisiae transfer RNA (tRNA) system
- Entacmaea quadricolor fluorescent protein system
- MacArthur: The
MacArthur/
folder provides the simulation scripts and data preprocessing code for the resource competition model, adapted from the framework developed by Tikhonov et al. - Coli: The
Coli/data
folder provides the original data, preprocessing scripts, and processed data for the real-world E. coli laboratory evolution system with multi-antibiotic resistance.
This repository provides the source code for all experiments in the main text and supplementary materials. The specific scripts are organized as follows:
Landscape/
: Contains the experimental scripts to reproduce Figures 2, 3, and 4 of the main text and Supplementary Figure 2.MacArthur/
: Contains the experimental scripts to reproduce Figure 5 of the main text and Supplementary Figure 1.Coli/
: Contains the experimental scripts to reproduce Figure 6 of the main text.
The real-world and simulated datasets used in this paper were collected and adapted from the following open-source studies and projects:
- CCU: Saccharomyces cerevisiae tRNA system
- eqFP611: Entacmaea quadricolor fluorescent protein system
- ParD3: E. coli toxin-antitoxin system
- MacArthur: Resource-competition model
- E. Coli: Multi-antibiotic resistance E. coli evolution experiment.
This repo is covered under the MIT License.