ESM-Ezy

Dataset and checkpoint

To get dataset and model checkpoint, please refer to .

Download the data.zip file and extract it to the data directory.

Download the ckpt.zip file and extract it to the ckpt directory.

About the `train_positive.fa` and `train_negative.fa` files

We apologize that the train_positive.fa (referred to as the original training positive file) and train_negative.fa (referred to as the original training negative file) included in data.zip do not exactly match the manuscript description. The original training positive file, which initially contained 117 entries, underwent incomplete data duplication. The revised train_positive.fa located in root directory of Zenodo now includes deduplicated 117 entries. Similarly, the train_negative.fa has been updated. It is important to note, however, that since the negative samples are randomly sampled from train_negative.fa during training (which is sufficiently large relative to train_positive.fa), minor changes to the latter have minimal impact on the training process. So even if you have downloaded the past version of train_positive.fa and train_negative.fa, the training process should still work fine.

Training

To train ESM-Ezy, follow the steps below:

Clone the repository:

git clone https://github.com/westlake-repl/ESM-Ezy.git

Install the required packages:

conda env create -f environment.yml

Download the pre-trained ESM-1b model:

wget https://dl.fbaipublicfiles.com/fair-esm/models/esm1b_t33_650M_UR50S.pt -O ckpt/esm1b_t33_650M_UR50S.pt
wget https://dl.fbaipublicfiles.com/fair-esm/regression/esm1b_t33_650M_UR50S-contact-regression.pt -O ckpt/esm1b_t33_650M_UR50S-contact-regression.pt

Train ESM-Ezy:

python scripts/train.py --train_positive_data data/train/train_positive.fa --train_negative_data data/train/train_negative.fa --test_positive_data data/train/test_positive.fa --test_negative_data data/train/test_negative.fa --model_path ckpt/esm1b_t33_650M_UR50S.pt

We also add early stopping to determine the training process is ready, you can try with:

python scripts/train.py --train_positive_data data/train/train_positive.fa --train_negative_data data/train/train_negative.fa --test_positive_data data/train/test_positive.fa --test_negative_data data/train/test_negative.fa --model_path ckpt/esm1b_t33_650M_UR50S.pt --patience 10

inference

inference from uniref50 database:

python scripts/inference.py --model_path ckpt/esm1b_t33_650M_UR50S.pt --checkpoint_path ckpt/model_laccase.pkl --inference_data data/inference/uniref50.fasta  --output_path data/retrieval

Search

load the trained ESM-Ezy model and inference on the candidate sequences:

python scripts/retrieval.py --model_path ckpt/esm1b_t33_650M_UR50S.pt --checkpoint_path ckpt/model_laccase.pkl --candidate_data data/retrieval/candidate.fa --seed_data data/retrieval/fitness.fa  --output_path data/retrieval

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
ckpt		ckpt
data		data
dataset		dataset
model		model
result		result
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
run_gen_repr.sh		run_gen_repr.sh
run_gen_repr_nockpt.sh		run_gen_repr_nockpt.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ESM-Ezy

Dataset and checkpoint

About the `train_positive.fa` and `train_negative.fa` files

Training

inference

Search

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

License

westlake-repl/ESM-Ezy

Folders and files

Latest commit

History

Repository files navigation

ESM-Ezy

Dataset and checkpoint

About the train_positive.fa and train_negative.fa files

Training

inference

Search

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

About the `train_positive.fa` and `train_negative.fa` files

Packages