mdaf3

Latest release
Status
Community

Important

This package relies on a feature branch of MDAnalysis to parse mmCIF files and therefore should be considered experimental. In the future, if this feature is merged, this package will use the official release. For now, please carefully validate your selections and see tests to see what is currently validated.

AlphaFold3 outputs a set of confidence metrics that are useful for i.e. protein binding predictions, however, making use of these metrics requires careful and time consuming parsing.

MDAnalysis provides an atom selection language (think SQL for molecular topologies) that makes associating confidence metrics with molecular positions, amino acid/atom type, and other topological information easy.

This package seeks to expose AF3 outputs (including confidence metrics) and predicted topologies via an easy-to-use interface.

How do I...

Get a python interface into my AF3 output?

# example inference output
from mdaf3.data.files import UNCOMPRESSED_AF3_OUTPUT_PATH
from mdaf3.AF3OutputParser import AF3Output

# Equivalent to AF3Output("/path/to/inference/output")
af3_output = AF3Output(UNCOMPRESSED_AF3_OUTPUT_PATH)

Get summary information about the AF3 inference job?

summary_dict = af3_output.get_summary_metrics()

chain_pair_iptm_ndarr = summary_dict["chain_pair_iptm"]
ranking_score = summary_dict["ranking_score"]

# all 'get_' methods take an optional "seed" and "sample_num" argument
# if not provided, the best model (by AF3 ranking score)
# is returned
summary_dict_seed_1 = af3_output.get_summary_metrics(seed=1)

Calculate the mean pLDDT of atoms in a particular protein?

u = af3_output.get_mda_universe()

# segid in MDAnalysis corresponds to protein ID in AF3 input JSON
# https://github.com/google-deepmind/alphafold3/blob/main/docs/input.md
selection = u.select_atoms("segid A")

# pLDDT is stored in the "tempfactors" attribute of an 
# MDAnalysis AtomGroup
mean_pLDDT_chain = selection.tempfactors.mean()

Find the minimum PAE among all residue pairs between two protein chains?

u = af3_output.get_mda_universe()

# move up the topology heirarchy
# from atom -> amino acid residue
# using AtomGroup.residues
protein_1_all_res = u.select_atoms("segid A").residues

protein_2_all_res = u.select_atoms("segid B").residues

pae_ndarr = af3_output.get_pae_ndarr()

# resindices are 0-indexed amino acid residue indices
# that correspond to AF3 token indices
min_pae_p1_p2 = pae_ndarr[protein_1_all_res.resindices][
    :, protein_2_all_res.resindices
].min()

# alternatively:
min_pae_p1_p2 = af3_output.get_summary_metrics()["chain_pair_pae_min"][0][1]

Find the max contact probability between a single residue of one protein chain and any residue in another protein chain?

u = af3_output.get_mda_universe()

protein_res_1 = u.select_atoms("segid A").residues[0]
protein_2_all_res = u.select_atoms("segid B").residues

contact_prob_ndarr = af3_output.get_contact_prob_ndarr()

max_contact_prob_res1_p2 = contact_prob_ndarr[protein_res_1.resindex][
    protein_2_all_res.resindices
].max()

Find the mean pLDDT of all atoms that are within 5 angstroms of a particular residue?

u = af3_output.get_mda_universe(seed=1)

particular_residue = u.select_atoms("segid A").residues[0].atoms

atoms_around_particular_res = u.select_atoms(
    "around 5 group pr", pr=particular_residue
)

mean_pLDDT_around_pr = atoms_around_particular_res.tempfactors.mean()

Batch apply a feature extraction method to all my AF3 jobs (with job names stored in a Polars DataFrame)?

from pathlib import Path
import polars as pl
from mdaf3.AF3OutputParser import AF3Output
from mdaf3.FeatureExtraction import serial_apply, split_apply_combine

def extract_protein1_mean_pLDDT(row, af3_parent_dir):
    job_dir = Path(af3_parent_dir) / row["job_name"]
    af3_output = AF3Output(job_dir)
    u = af3_output.get_mda_universe()
    protein1_mean_pLDDT = u.select_atoms("segid A").tempfactors.mean()
    row["protein1_mean_pLDDT"] = protein1_mean_pLDDT
    return row

all_jobs = pl.DataFrame({"job_name": ["93f0240a1d2c15da9551841d22239d41"]})

af3_parent_dir = "mdaf3/data"

# use split_apply_combine for process-parallel execution.
# these methods will convert each row into a dict,
# pass it to your extraction method,
# and then concat the resulting pl.DataFrames
all_job_with_feat = serial_apply(
    all_jobs, extract_protein1_mean_pLDDT, af3_parent_dir
)

feature_np = (
    all_job_with_feat.select("protein1_mean_pLDDT").to_series().to_numpy()
)

Compress my AF3 output directory without losing confidence metric precision?

Note

This will delete 'TERMS_OF_USE.md' as well as the input JSON for the AF3 job ('<job_name>_data.json') among other things. This feature is designed with large HPC batches in mind, so if you aren't sure, read the compression code!

af3_output = AF3Output(UNCOMPRESSED_AF3_OUTPUT_PATH)
af3_output.compress()

Installation

Below we provide instructions both for conda and for pip.

First, clone the repo locally:

git clone https://github.com/ljwoods2/mdaf3.git
cd mdaf3

With conda

Ensure that you have conda installed.

Create a virtual environment and activate it:

conda create --name mdaf3
conda activate mdaf3

Install the dependencies:

conda env update --name mdaf3 --file devtools/conda-envs/test_env.yaml

Build this package from source:

pip install -e .

If you want to update your dependencies (which can be risky!), run:

conda update --all

And when you are finished, you can exit the virtual environment with:

conda deactivate

With pip

To build the package from source, run:

pip install .

If you want to create a development environment, install the dependencies required for tests and docs with:

pip install ".[test]"

mdaf3 is bound by a Code of Conduct.

Copyright

The mdaf3 source code is hosted at https://github.com/ljwoods2/mdaf3 and is available under the GNU General Public License, version 2 (see the file LICENSE).

Acknowledgements

Project based on the MDAnalysis Cookiecutter version 0.1. Please cite MDAnalysis when using mdaf3 in published work.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github		.github
devtools/conda-envs		devtools/conda-envs
docs		docs
mdaf3		mdaf3
.codecov.yml		.codecov.yml
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
AUTHORS.md		AUTHORS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mdaf3

How do I...

Installation

With conda

With pip

Copyright

Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

AltinLab/mdaf3

Folders and files

Latest commit

History

Repository files navigation

mdaf3

How do I...

Installation

With conda

With pip

Copyright

Acknowledgements

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages