-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Summary
Many users probably want to follow specific atoms/bonds through the fragmentation process. However, as a result of performing canonicalization of input molecules, there's no way to map from an atom in the input molecule of a Fragmenter job to an atom in the result
's parent_molecule
or any of the output fragments.
Reproducing example
from openff.toolkit.topology import Molecule
from openff.fragmenter.fragment import WBOFragmenter
mol1 = Molecule.from_smiles('ClCCCF')
frag_engine = WBOFragmenter()
result = frag_engine.fragment(mol1)
def draw_mol_and_label_atom_index(mol):
rdmol = mol.to_rdkit()
for atom in rdmol.GetAtoms():
atom.SetAtomMapNum(atom.GetIdx())
return rdmol
mol1
has the following atom map, and the drawing below shows the atom indices (NOT map indices)
print(mol1.properties)
draw_mol_and_label_atom_index(mol1)
{'atom_map': {0: 1,
1: 2,
2: 3,
3: 4,
4: 5,
5: 6,
6: 7,
7: 8,
8: 9,
9: 10,
10: 11}}
The result's parent_molecule
has the following atom map, and the drawing below shows the atom indices (NOT map indices)
result.parent_molecule.properties
draw_mol_and_label_atom_index(result.parent_molecule)
{'atom_map': {0: 1,
1: 3,
2: 5,
3: 4,
4: 2,
5: 8,
6: 9,
7: 10,
8: 11,
9: 6,
10: 7}}
We see that the Cl switched from being atom index 0 in the input, to atom index 4 in the parent. There's no direct way to map from 0 to 4 given the information in the result
object.
Solutions
I think the root cause of this issue is that the OpenFF toolkit performs canonicalization, but it doesn't have a way to return the atom mapping that got applied. I'll open an issue on the Toolkit repo to have that mapping get returned, and once that's available we can provide it to the user here.