Code for the following publications:
- Weber, D. S.; Warren, J. J. The Interaction between Methionine and Two Aromatic Amino Acids Is an Abundant and Multifunctional Motif in Proteins. Arch. Biochem. Biophys. 2019, 672, 108053.
- Weber, D. S.; Warren, J. J. A Survey of Methionine-Aromatic Interaction Geometries in the Oxidoreductase Class of Enzymes: What Could Met-Aromatic Interactions be Doing Near Metal Sites? J. Inorg. Biochem. 2018, 186, 34-41.
- Synopsis
- Setup
- How it works
- Finding Met-aromatic pairs
- Finding "bridging interactions"
- Running jobs and MongoDB integration
- Using the MetAromatic API
This program returns a list of closely spaced methionine-aromatic residue pairs for structures in the Protein Data Bank (PDB). The program supports running queries on single PDB entries or large scale multithreaded batch jobs consisting of hundreds of thousands of queries.
Simply run:
pip install MetAromatic
Note
See my master's thesis 1 for a more indepth discussion surrounding the "Met-aromatic" algorithm.
Files in the PDB are organized using the SMCRA hierarchy: Structure, Model, Chain, Residue and Atom. For example, here is the fifth line for the entry 1RCY:
1 2 3 4 5 6 7 8 9 10 11 12
ATOM 5 CB THR A 5 -13.081 2.366 23.788 1.00 37.95 C
Each line in a file corresponds to a single atom. In the above example, we have a carbon atom that is labelled
CB
(column 3) on a threonine residue (THR
, column 4) located on the A
chain (column 5). Columns 7-9
specify the
The Met-aromatic program starts by downloading a *.pdb
file from the PDB over FTP. The file is then stripped
down to the subset of coordinates corresponding to a chain [A-Z]
of choosing. Most PDB entries consist of
A
and B
chains. The program then strips the dataset down to the residues tyrosine (TYR
), tryptophan
(TRP
), phenylalanine (PHE
), and methionine (MET
). In the last step of preprocessing, the program further
strips the dataset down to any of the following atoms:
MET: CE, SD, CG
TYR: CD1, CE1, CZ, CG, CD2, CE2
TRP: CD2, CE3, CZ2, CH2, CZ3, CE2
PHE: CD1, CE1, CZ, CG, CD2, CE2
The above atoms are the aromatic carbon atoms in the aromatic residues tyrosine (TYR
), tryptophan, (TRP
)
and phenylalanine (PHE
).
The program applies the distance condition to find methionine-aromatic pairs that are physically near each
other. To do so, the program first finds the midpoints between all neighbouring aromatic carbon atoms in all
of tyrosine, tryptophan and phenylalanine. These midpoints are denoted using a *
:
TYR: CD1*|CE1*|CZ*|CG*|CD2*|CE2*
TRP: CD2*|CE3*|CZ2*|CH2*|CZ3*|CE2*
PHE: CD1*|CE1*|CZ*|CG*|CD2*|CE2*
Next, the program finds all the vectors projecting from all methionine SD
atoms to all the midpoints *
. As
an example, a substructure consisting of one methionine and two phenylalanines would have 12 possible such
vectors, depicted here as
To apply the distance condition, the program simply banks those methionine-aromatic pairs where one or more
vectors CD1*
in TYR
is the
midpoint between the CD1
and CE1
carbon atoms. A CD1* / SD
on an aromatic / methionine pair would meet
the distance condition if the following held:
Any methionine-aromatic pairs meeting the distance condition are subjected to the angular condition. The
angular condition can be loosely interpreted as "find all the methionine-aromatic pairs where the methionine
lone pairs are pointing into or near the region(s) of highest electron density on a corresponding aromatic
moiety." To apply the angular condition, two new vectors are introducted: vector SD
lone pairs in three dimensional space. To find the lone
pairs, the program considers the SD
atom in a methionine atom to be the center of a regular tetrahedron with
vertices CE
and CG
. Solving for the position of the remaining two vertices yields vectors
Last, a methionine-aromatic pair is deemed interacting if any of
holds.
The end result is a dataset consisting of methionine-aromatic pairs whereby one or both of the methionine lone pairs are pointing into or near the region of highest electron density on the corresponding aromatic residues. A representative figure is shown below:
The easiest means of performing Met-aromatic calculations is to run jobs in a terminal session. The simplest query follows:
runner pair 1rcy
Here, the pair
argument specifies that we want to run a single aromatic interaction calculation. The query
will yield the following results:
-------------------------------------------------------------------
ARO POS MET POS NORM MET-THETA MET-PHI
-------------------------------------------------------------------
TYR 122 18 4.211 75.766 64.317
TYR 122 18 3.954 60.145 68.352
TYR 122 18 4.051 47.198 85.151
TYR 122 18 4.39 53.4 95.487
TYR 122 18 4.62 68.452 90.771
TYR 122 18 4.537 78.568 76.406
PHE 54 148 4.777 105.947 143.022
PHE 54 148 4.61 93.382 156.922
PHE 54 148 4.756 93.287 154.63
-------------------------------------------------------------------
Above we have an order VI interaction between TYR 122 and MET 18, that is, all six vectors SD
on MET 18 to the midpoints on TYR 122 meet Met-aromatic criteria. We also have an
order IV interaction between PHE 54 and MET 148. The NORM
column specifies the actual distance (in
--cutoff-distance
option:
runner --cutoff-distance 4.0 pair 1rcy
Reducing the cutoff distance yields an order I interaction between TYR 122 and MET 18.
-------------------------------------------------------------------
ARO POS MET POS NORM MET-THETA MET-PHI
-------------------------------------------------------------------
TYR 122 18 3.954 60.145 68.352
-------------------------------------------------------------------
MET-THETA
and MET-PHI
refer to --cutoff-angle
option:
runner --cutoff-distance 4.5 --cutoff-angle 60 pair 1rcy
The --cutoff-angle
option ensures that at least one of
-------------------------------------------------------------------
ARO POS MET POS NORM MET-THETA MET-PHI
-------------------------------------------------------------------
TYR 122 18 4.051 47.198 85.151
TYR 122 18 4.39 53.4 95.487
-------------------------------------------------------------------
The default lone pair interpolation model is cp
or Cross Product (a thorough description is available in my
master's thesis. 1 There exists another model, rm
or Rodrigues Method for predicting the positions of
lone pairs. This model is based on the Rodrigues' Rotation Formula. The model type can be passed as follows:
runner --cutoff-distance 4.5 --cutoff-angle 60 --model rm pair 1rcy
Which yields similar results:
-------------------------------------------------------------------
ARO POS MET POS NORM MET-THETA MET-PHI
-------------------------------------------------------------------
TYR 122 18 3.954 57.237 64.226
TYR 122 18 4.051 45.097 80.768
TYR 122 18 4.39 52.505 91.841
-------------------------------------------------------------------
Note that the Euclidean distances between TYR aromatic carbon atoms and MET remain unchanged. By default, this
program searches for "A" delimited chains. Some researchers may, however, be interested in searching for
aromatic interactions in a different chain within a multichain protein. The --chain
option can be used to
specify the chain:
runner --cutoff-distance 4.5 --cutoff-angle 60 --model rm --chain B pair 1rcy
In this case, no results are returned because the PDB entry 1rcy does not contain a "B" chain.
Bridging interactions are interactions whereby two or more aromatic residues meet the criteria of the Met-aromatic algorithm, for example, in the example below (PDB entry 6C8A):
We can specify a search for bridging interactions, instead of conventional aromatic interactions, using the
bridge
argument. For example, to search for bridging interactions with a 7.0
runner --cutoff-distance 7.0 bridge 6lu7
Which will return a list as follows:
-------------------------------------------------------------------
{MET264}-{PHE219}-{TYR209}
{MET276}-{TRP207}-{TRP218}
{PHE185}-{PHE181}-{MET165}
-------------------------------------------------------------------
Where each row corresponds to a bridge. This program treats bridging interactions as networks with a defined
set of vertices. For example, the above examples are 2-bridges with 3 vertices: ARO - MET - ARO. The
--vertices
option can be passed to search for n-bridges:
runner --cutoff-distance 6.0 bridge 6lu7 --vertices 4
Note
This section assumes a host is running MongoDB 2 and familiarity with the MongoDB suite of products.
This software is normally used for large scale Protein Data Bank mining efforts and stores the results in a MongoDB database. To get started, prepare a batch file of PDB codes to scan. A batch file can be a regular text file consisting of delimited PDB codes:
1xak, 1uw7, 2ca1, 1qz8, 2fxp, 2fyg, 2cme, 6mwm, spam
Then run:
runner batch </path/batch/file> --threads <num-threads> --database <db> --collection <collection>
The MongoDB database name is specified using the --database
option and the collection name is specified with
the --collection
option. The --threads
option specifies how many threads to use for processing the batch.
The hostname of the server hosting the MongoDB deployment can be provided using the --host
option if the
MongoDB deployment is not bound to localhost.
Important
The program assumes that authentication is enabled and will prompt for a username and password.
If no issues arise when connecting to MongoDB, the batch job will proceed. Below is an example of output corresponding to the above 9 codes:
... MainThread INFO Importing pdb codes from file /tmp/foo.txt
... MainThread INFO Splitting list of PDB codes into 3 chunks
... MainThread INFO Deploying 3 workers!
... MainThread INFO Registering SIGINT to thread terminator
... Batch_0 INFO Processing 1xak. Count: 1
... Batch_1 INFO Processing 1uw7. Count: 2
... Batch_2 INFO Processing 2ca1. Count: 3
... Batch_0 INFO Processing 1qz8. Count: 4
... Batch_1 INFO Processing 2fxp. Count: 5
... Batch_2 INFO Processing 2fyg. Count: 6
... Batch_0 INFO Processing 2cme. Count: 7
... Batch_2 INFO Processing spam. Count: 8
... Batch_1 INFO Processing 6mwm. Count: 9
... MainThread INFO Batch job complete!
... MainThread INFO Results loaded into database: test
... MainThread INFO Results loaded into collection: test
... MainThread INFO Batch job execution time: 1.746000 s
... MainThread INFO Loading:
{
"batch_job_execution_time": 1.746,
"data_acquisition_date": "...", // date truncated for clarity
"num_workers": 3,
"number_of_entries": 9,
"chain": "A",
"cutoff_angle": 109.5,
"cutoff_distance": 4.9,
"model": "cp"
}
Into collection: test_info
... MainThread INFO Unregistering SIGINT from thread terminator
And deployed with the command:
runner batch /tmp/foo.txt --threads 3 --database test --collection test
# Where foo.txt contains: 1xak, 1uw7, 2ca1, 1qz8, 2fxp, 2fyg, 2cme, 6mwm, spam
A batch job will generate a collection secondary to the collection specified by --collection
. This secondary
collection will house all the batch job parameters and other statistics and the collection name will be
suffixed with _info
. Note that the above command generated:
{
"batch_job_execution_time": 1.746,
"data_acquisition_date": "...", // date truncated for clarity
"num_workers": 3,
"number_of_entries": 9,
"chain": "A",
"cutoff_angle": 109.5,
"cutoff_distance": 4.9,
"model": "cp"
}
One may be interested in extending the Met-aromatic project into a customized workflow. The instructions
provided in the Setup section install MetAromatic source into site-packages
. Therefore, the API
can be exposed for use in a custom script.
The command:
runner --cutoff-distance=4.1 pair 1rcy
Will return:
-------------------------------------------------------------------
ARO POS MET POS NORM MET-THETA MET-PHI
-------------------------------------------------------------------
TYR 122 18 3.954 60.145 68.352
TYR 122 18 4.051 47.198 85.151
-------------------------------------------------------------------
The following snippet approximates the above output:
from json import dumps
from MetAromatic import get_pairs_from_pdb
def main() -> None:
results = get_pairs_from_pdb(
cutoff_distance=4.1, cutoff_angle=109.5, chain="A", model="cp", pdb_code="1rcy"
)
for interaction in results.interactions:
print(dumps(interaction.to_dict(), indent=4))
if __name__ == "__main__":
main()
Which will print to stdout
:
{
"aromatic_position": 122,
"aromatic_residue": "TYR",
"met_phi_angle": 68.352,
"met_theta_angle": 60.145,
"methionine_position": 18,
"norm": 3.954
}
{
"aromatic_position": 122,
"aromatic_residue": "TYR",
"met_phi_angle": 85.151,
"met_theta_angle": 47.198,
"methionine_position": 18,
"norm": 4.051
}
The command:
runner bridge 6lu7
Will return:
-------------------------------------------------------------------
{PHE134}-{TYR182}-{MET130}
-------------------------------------------------------------------
The following snippet approximates the above output:
from MetAromatic import get_bridges
def main() -> None:
results = get_bridges(
chain="A",
code="6lu7",
cutoff_angle=109.5,
cutoff_distance=4.9,
model="cp",
vertices=3,
)
for bridge in results.bridges:
print(bridge)
if __name__ == "__main__":
main()
Running this script will print:
{'MET130', 'PHE134', 'TYR182'}