Parrot

Implementation of reaction condition prediction with Parrot

Publication

Xiaorui Wang, Chang-Yu Hsieh*, Xiaodan Yin, Jike Wang, Yuquan Li, Yafeng Deng, Dejun Jiang, Zhenxing Wu, Hongyan Du, Hongming Chen, Yun Li, Huanxiang Liu, Yuwei Wang, Pei Luo, Tingjun Hou*, Xiaojun Yao*. Generic Interpretable Reaction Condition Predictions with Open Reaction Condition Datasets and Unsupervised Learning of Reaction Center. Research 2023;6:Article 0231. DOI:10.34133/research.0231

Quickly Start From Gitpod

About 4 minutes.

OS Requirements

This repository has been tested on Linux operating systems.

Python Dependencies

Python (version >= 3.7)
PyTorch (version >= 1.10.0)
RDKit (version >= 2019)
Transformers (version == 4.18.0)
Simpletransformers (version == 0.63.6)

Installation Guide

Create a virtual environment to run the code of Parrot.
It is recommended to use conda to manage the virtual environment.The installation method for conda can be found here.
Make sure to install pytorch with the cuda version that fits your device.
This process usually takes few munites to complete.

git clone https://github.com/wangxr0526/Parrot.git
cd Parrot
conda env create -f envs.yaml
conda activate parrot_env
pip install gdown wtforms flask flask_bootstrap

Use Parrot

You can use Parrot to predict suitable catalysts, solvents and reagents, and temperatures for reactions.
First download the model and datasest files by this command:

python preprocess_script/download_data.py

The links correspond to the paths of the zip files as follows:

https://drive.google.com/uc?id=1aX70qzZrJ9TZ9KpqnvUVR8WBxiTwXOsI    --->    dataset/source_dataset/USPTO_condition_final.zip

https://drive.google.com/uc?id=1uEqpkF4tPTlLIPdTyWJdXows7hKQbAAc    --->    dataset/pretrain_data.zip

https://drive.google.com/uc?id=1gFV2KdVKaLCTeb3nrzopyYHXbM0G_cr_    --->    outputs/Parrot_train_in_USPTO_Condition_enhance.zip

https://drive.google.com/uc?id=1bVB89ByGkYjiUtbvEcp1mgwmoKy5Ka2b    --->    outputs/best_rcm_model_pretrain.zip

https://drive.google.com/uc?id=1DmHILXSOhUuAzqF0JmRTx1EcOOQ7Bm5O    --->    outputs/best_mlm_model_pretrain.zip

We provide two usage methods, one is to use the command line, and the other is through the web interface.

Command

Then prepare the txt file containing the SMILES of the reactions you want to predict, and enter the following command:

cd Parrot
python inference.py --config_path path/to/config_file.yaml \
                    --input_path path/to/input_file.txt \
                    --output_path path/to/output.csv \
                    --num_workers NUM_WORKERS \
                    --inference_batch_size BATCH_SIZE \
                    --gpu CUDA_ID          # use cpu: CUDA_ID=-1

For example, using Parrot predictions trained on the USPTO-Condition dataset, use the following command:

python inference.py --config_path configs/config_inference_use_uspto.yaml \
                    --input_path test_files/input_demo.txt \
                    --output_path test_files/predicted_conditions.csv \
                    --num_workers 6 \
                    --inference_batch_size 8 \
                    --gpu 0

Or using Parrot predictions trained on the Reaxys-TotalSyn-Condition dataset, use the following command:

# Could be used to predict temperatures.
python inference.py --config_path configs/config_inference_use_reaxys.yaml \
                    --input_path test_files/input_demo.txt \
                    --output_path test_files/predicted_conditions.csv \
                    --num_workers 6 \
                    --inference_batch_size 8 \
                    --gpu 0

Web Interface

Use this command to run web interface.

cd web_app
python app.py

Open the browser, enter: http://127.0.0.1:8000 and you will see the following interface:

Support three input methods

Draw

Reaction SMILES

TXT Files

Reproduce the results

[1] Get Dataset

The complete processed USPTO-Condition, USPTO-Suzuki-Condition and pretrain dataset after USE Parrot is already in dataset/source_dataset/USPTO_condition_final and dataset/pretrain_data, if you want to recreate the USPTO-Condition dataset, you can read here. If you want to use Reaxys-TotalSyn-Condition, you can only process it from scratch. We provide the ReaxysID of the data and the script for processing. For details, you can read here.The final dataset directory structure should be as follows:

dataset/
├── pretrain_data
│   ├── mlm_rxn_train.txt                # MLM pretrain dataset (train)
│   ├── mlm_rxn_val.txt                  # MLM pretrain dataset (validation)
│   ├── rxn_center_modeling.pkl          # RCM pretrain dataset (train + validation)
│   └── vocab.txt                        # Parrot reaction SMILES vocabulary
└── source_dataset
    ├── Reaxys_total_syn_condition_final
    │   ├── Reaxys_total_syn_condition.csv
    │   ├── Reaxys_total_syn_condition_alldata_idx.pkl
    │   └── Reaxys_total_syn_condition_condition_labels.pkl
    └── USPTO_condition_final
        ├── canonical_pistachio_label.json
        ├── condition_replace_dict_final.json
        ├── USPTO_condition_alldata_idx.pkl
        ├── USPTO_condition_aug_n5_alldata_idx.pkl
        ├── USPTO_condition_aug_n5_condition_labels.pkl
        ├── USPTO_condition_aug_n5.csv
        ├── USPTO_condition_condition_labels.pkl
        ├── USPTO_condition.csv
        ├── USPTO_condition_pred_category.csv
        └── USPTO_condition_pred_category_org.csv

[2] Pretrain

Masked Language Modeling pretrain:

python pretrain_mlm.py --gpu CUDA_ID --config_path configs/pretrain_mlm_config.yaml

masked Reaction Center Modeling pretrain:

python pretrain_rcm.py --gpu CUDA_ID --config_path configs/pretrain_rcm_config.yaml

After pretraining, you will get best_mlm_uspto_pretrain and best_rcm_uspto_pretrain containing model state in outputs.

[3] Train Parrot

Training in the USPTO-Condition dataset:

Parrot-ML

python train_parrot_model.py --gpu CUDA_ID --config_path configs/config_uspto_condition.yaml

Parrot-ML-E

python train_parrot_model.py --gpu CUDA_ID \
                            --config_path configs/config_uspto_condition_aug_n5_lr_low.yaml

Training in the Reaxy-TotalSyn-Condition dataset:

Parrot-RCM

python train_parrot_model.py --gpu CUDA_ID \
                             --config_path configs/config_reaxys_totalsyn_condition.yaml

[4] Test Parrot

Test in the USPTO-Condition dataset:

Parrot-ML-E

python test_parrot_model.py --gpu CUDA_ID \
                            --config_path configs/config_uspto_condition_aug_n5_lr_low.yaml

Test in the Reaxy-TotalSyn-Condition dataset:

Parrot-RCM

python test_parrot_model.py --gpu CUDA_ID \
                            --config_path configs/config_reaxys_totalsyn_condition.yaml

Cite Us

@article{
doi:10.34133/research.0231,
author = {Xiaorui Wang  and Chang-Yu Hsieh  and Xiaodan Yin  and Jike Wang  and Yuquan Li  and Yafeng Deng  and Dejun Jiang  and Zhenxing Wu  and Hongyan Du  and Hongming Chen  and Yun Li  and Huanxiang Liu  and Yuwei Wang  and Pei Luo  and Tingjun Hou  and Xiaojun Yao },
title = {Generic Interpretable Reaction Condition Predictions with Open Reaction Condition Datasets and Unsupervised Learning of Reaction Center},
journal = {Research},
volume = {6},
pages = {0231},
year = {2023},
doi = {10.34133/research.0231},
URL = {https://spj.science.org/doi/abs/10.34133/research.0231},
eprint = {https://spj.science.org/doi/pdf/10.34133/research.0231},
}

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
analysis_script		analysis_script
baseline_model		baseline_model
configs		configs
dataset		dataset
models		models
outputs		outputs
paper_data		paper_data
preprocess_script		preprocess_script
test_files		test_files
web_app		web_app
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
LICENSE		LICENSE
README.md		README.md
envs.yaml		envs.yaml
envs_cpu.yaml		envs_cpu.yaml
inference.py		inference.py
pretrain_mlm.py		pretrain_mlm.py
pretrain_rcm.py		pretrain_rcm.py
test_parrot_model.py		test_parrot_model.py
test_parrot_ood.py		test_parrot_ood.py
train_parrot_model.py		train_parrot_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Parrot

Contents

Publication

Quickly Start From Gitpod

OS Requirements

Python Dependencies

Installation Guide

Use Parrot

Command

Web Interface

Draw

Reaction SMILES

TXT Files

Reproduce the results

[1] Get Dataset

[2] Pretrain

[3] Train Parrot

[4] Test Parrot

Cite Us

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

wangxr0526/Parrot

Folders and files

Latest commit

History

Repository files navigation

Parrot

Contents

Publication

Quickly Start From Gitpod

OS Requirements

Python Dependencies

Installation Guide

Use Parrot

Command

Web Interface

Draw

Reaction SMILES

TXT Files

Reproduce the results

[1] Get Dataset

[2] Pretrain

[3] Train Parrot

[4] Test Parrot

Cite Us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages