Functionality to evaluate the synthetic accessibility of TMCs.
A virtual library of reference correct TMCs is used to build a dictionary of allowed chemical features. The chemical features of input molecules are compared against this dictionary to yield a familiarity score of a given TMC.
Based on the MoleculeAutoCorrect and Molpert library by: Kerstjens, A., De Winter, H. Molecule auto-correction to facilitate molecular design. J Comput Aided Mol Des 38, 10 (2024)..
Github: MoleculeAutoCorrect
- gcc / g++
$\geq$ 10.*
The MoleculeAutoCorrect and Molpert repos depend on the source C++ code from RDKit. In newer versions of RDKit, certain header files have been removed from the conda install. Therefore, these repos need to be compiled in a conda environment with RDKit version 2022.09.5. Once the binaries from the MoleculeAutoCorrect repo has been compiled, these binaries can be called from any conda env. E.g a conda env that uses an updated version of RDKit.
Additionally, the gcc and g++ compiler versions need to be version 10 or higher. Otherwise you will get C++ syntax compiler errors.
The following instructions are for GNU+Linux. For alternative operating systems you'll have to adapt these commands slightly.
To get started, first install the conda env: env.yml
conda env create --file ./env.yml
Then activate the created environment.
conda activate tmc_sa
Then install MoleculeAutoCorrect and Molpert by running the following:
./install.sh
To be able to import the library from Python add ${MOLECULE_AUTO_CORRECT}/lib
and ${MOLPERT}/lib
to your ${PYTHONPATH}
. Consider doing so in your .bash_profile
file. Otherwise you'll have to manually extend ${PYTHONPATH}
everytime you open a new shell.
export PYTHONPATH="${PYTHONPATH}:${MOLECULE_AUTO_CORRECT}/lib"
export PYTHONPATH="${PYTHONPATH}:${MOLPERT}/lib"
Now you should be able to run the commands given in Quick start.
We provide python wrapper functions that call the compiled binaries from MoleculeAutoCorrect.
Get your hands on a virtual library of molecules you would like to use as reference of correct chemistry (here tmc.smi
). Then use this library to create a dictionary of chemical features (here tmc.dict
). You can specify the radius of circular atomic environments using the --environment_radius argument (here 1
).
Important
The input SMILES need to have explicit hydrogens to get proper encoding of keys! Currently, SMILES missing explicit hydrogens are encoded incorrectly.
Creating the tmc.dict by calling the MoleculeAutoCorrect binaries with Python using the scripts highlighted below: NB! When creating the dict, it is important that you use the conda env installed above (env.yml). Otherwise you will get an error. After the .dict has been created you can switch to an environment with a newer version of RDKit.
python ./create_chemical_dictionary_from_smiles.py --smiles_data ./data/tmc.smi --dict_name ./dicts/tmc.dict --environment_radius 1
Use the tmc.dict to get familiarity scores for a given TMC SMILES:
python ./get_sa_from_tmc_smiles.py --smiles "CN(C)C=O->[Ni+2]123<-[N-](C(=O)CN->1(CC(=O)[N-]->2c1ccccc1)CC(=O)[N-]->3c1ccccc1)c1ccccc1" --reference_dict ./dicts/tmc.dict
You can also inspect the output of HighlightMoleculeErrors directly running the binary:
bin/HighlightMoleculeErrors ./dicts/tmc.dict "CCCN(C)[Mo](<-[C]1N(CC)C=CN1CC)(N(C)CCC)N(C)CCC" molecule_errors.svg
get_sa_from_tmc_smiles
contains the get_familiarity
function which returns the calculated familiarities. This function can be imported in other scripts and then used
as an SA score calculator.