AlphaPulldown + AlphaFold3 Research Log

This README documents the complete pipeline, setup, troubleshooting, and experimental execution history for running AlphaPulldown and AlphaFold3 workflows for protein complex structure prediction. All content is designed for reproducibility and future reference.

TODO

Check missing_proteins.txt for Feature Database completeness.
Investigate MMseqs2 errors and output inconsistencies.

Reference Links

UniProt

Protein sequences: https://www.uniprot.org
REST API example: https://www.uniprot.org/uniprotkb/Q04740/entry

AlphaPulldown

AF2-Multimer

SBGrid example: https://sbgrid.org/wiki/examples/alphafold2
GitHub: https://github.com/google-deepmind/alphafold

AlphaFold3

Server: https://alphafoldserver.com/welcome
GitHub: https://github.com/google-deepmind/alphafold3

File Transfer & Utilities

# Transfer files (adjust paths as needed):
scp /local/path/to/file jzho349@kilimanjaro.biochem.emory.edu:/remote/path
scp -r jzho349@kilimanjaro.biochem.emory.edu:/remote/result/path ~/Downloads/

# Split protein FASTA list into 8 chunks:
split -n l/8 --suffix-length=2 --additional-suffix=.txt unique_protein_ids_Jack.txt target_chunk_Jack_

# One-liner tracing number of jobs completed
for d in gpu_*; do echo "$d: $(ls -1 "$d" | wc -l)"; done

Expected Data from AlphaFold3

mmCIF: Predicted complex structure.
pLDDT: Per-residue confidence score.
Possibly more outputs such as ipTM or pairwise confidence (to be explored).

AlphaPulldown Overview

AlphaPulldown consists of two core stages:

create_individual_features.py:
- Computes MSAs and finds templates.
- Stores monomer features as .pkl files.
run_multimer_jobs.py:
- Predicts structures based on the generated features.

Environment Setup

# Create environment
conda create -n AlphaPulldown -c omnia -c bioconda -c conda-forge \
  python==3.11 openmm==8.0 pdbfixer==1.9 kalign2 hhsuite hmmer modelcif

# Activate & install JAX for GPU acceleration
conda activate AlphaPulldown
pip install -U "jax[cuda12]"

Feature Generation

MMseqs2 (Sequential Mode)

NUM_SEQ=$(grep -c "^>" protein_sequences.fasta)
for ((i=0; i<$NUM_SEQ; i++)); do
  echo "Processing index $i"
  create_individual_features.py \
    --fasta_paths=$FASTA_PATH \
    --data_dir=$DATA_DIR \
    --output_dir=$OUTPUT_DIR \
    --skip_existing=True \
    --use_mmseqs2=True \
    --max_template_date="2050-01-01" \
    --seq_index=$i || echo $i >> failed_indexes.txt

done

JackHMMer (Parallelized)

MAX_JOBS=16
rm -f $FAILED_LOG

for ((i=0; i<$NUM_SEQ; i++)); do
  (
    create_individual_features.py \
      --fasta_paths=$FASTA_PATH \
      --data_dir=$DATA_DIR \
      --output_dir=$OUTPUT_DIR \
      --skip_existing=True \
      --use_mmseqs2=False \
      --max_template_date="2050-01-01" \
      --seq_index=$i || echo $i >> $FAILED_LOG
  ) &

  while (( $(jobs -r | wc -l) >= MAX_JOBS )); do sleep 1; done

done
wait

Understanding `create_individual_features.py`

CPU-only script
Output format: Pickle files (.pkl)
Requires pre-downloaded Genetic Database
Feature sources:
1. JackHMMer/HHblits (default)
2. MMseqs2
3. Precomputed Feature Pickles

Resources

Genetic DB: https://github.com/KosinskiLab/alphafold#genetic-databases
Feature DB: https://alphapulldown.s3.embl.de/index.html

Accessing Databases

Genetic Database

bash download_all_data.sh /data7/Conny/data/AF_GeneticDB

Feature Pickle Database

# Install MinIO
curl -O https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc && mkdir -p $HOME/bin && mv mc $HOME/bin/
echo 'export PATH=$HOME/bin:$PATH' >> ~/.bashrc && source ~/.bashrc

# Example download
mc cp embl/alphapulldown/input_features/Saccharomyces_cerevisiae/Q01329.pkl.xz /data7/Conny/data/JackFeaturePickleDB

# Batch download
bash download_found.sh | tee download_output.log
xz -dk *.xz  # decompress while keeping original

Multimer Job Execution

Parallel execution across 8 GPUs:

TARGET_FILES=(
  /data7/Conny/data/target_chunk_aa
  /data7/Conny/data/target_chunk_ab
  ...
  /data7/Conny/data/target_chunk_ah
)

for i in ${!TARGET_FILES[@]}; do
  CUDA_VISIBLE_DEVICES=$i run_multimer_jobs.py \
    --mode=pulldown \
    --monomer_objects_dir=$MONOMER_OBJECTS_DIR \
    --protein_lists=$(realpath $BAIT_FILE),${TARGET_FILES[$i]} \
    --output_path=$BASE_OUTPUT/gpu_$i \
    --data_dir=$DATA_DIR \
    --num_cycle=3 \
    --num_predictions_per_model=1 \
    > $BASE_OUTPUT/gpu_$i/run.log 2>&1 &
done

wait

Result Analysis

Create Jupyter notebooks using:

create_notebook.py --cutoff=5.0 --output_dir=/path/to/results

Visualize outputs (e.g., PAE, ipTM, ranked pdbs)
For full table generation:
- Build apptainer image with CCP4 libraries
- Execute:

apptainer exec --no-home --bind /results:/mnt fold_analysis_final.sif /app/run_get_good_pae.sh --output_dir=/mnt --cutoff=10

AlphaFold3 Notes

GUI: Automated MSA
Local mode: Accepts MSA input via JSON
Additional work needed to automate Selenium upload process (autoclicker test in progress)

Troubleshooting & Exceptions

UniRef30 issue → Fix with: https://github.com/google-deepmind/alphafold/pull/860/files
MMseqs2: 5 failures → P40449, P23369, Q12754, Q22354, Q12019
JackHMMer: 3 known failures → P40328, P48570, P32432，P38691，P48510，P53920，Q00582 → cd /data7/Conny/result_JackHMMer/missing_pkls.txt
FeatureDB: 20 missing pickles → cd /data7/Conny/data/JackFeaturePickleDB/missing_pickle_proteins.txt
AF3: 1 missing pickles due to large size → Job_999

This document is actively maintained as experimental runs progress.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AlphaPulldown + AlphaFold3 Research Log

Table of Contents

TODO

Reference Links

UniProt

AlphaPulldown

AF2-Multimer

AlphaFold3

File Transfer & Utilities

Expected Data from AlphaFold3

AlphaPulldown Overview

Environment Setup

Feature Generation

MMseqs2 (Sequential Mode)

JackHMMer (Parallelized)

Understanding `create_individual_features.py`

Resources

Accessing Databases

Genetic Database

Feature Pickle Database

Multimer Job Execution

Result Analysis

AlphaFold3 Notes

Troubleshooting & Exceptions

About

Uh oh!

Releases

Packages

License

JunyiZhou-Conny/AlphaFold-Pipeline-for-Bou-Nader-Lab

Folders and files

Latest commit

History

Repository files navigation

AlphaPulldown + AlphaFold3 Research Log

Table of Contents

TODO

Reference Links

UniProt

AlphaPulldown

AF2-Multimer

AlphaFold3

File Transfer & Utilities

Expected Data from AlphaFold3

AlphaPulldown Overview

Environment Setup

Feature Generation

MMseqs2 (Sequential Mode)

JackHMMer (Parallelized)

Understanding create_individual_features.py

Resources

Accessing Databases

Genetic Database

Feature Pickle Database

Multimer Job Execution

Result Analysis

AlphaFold3 Notes

Troubleshooting & Exceptions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Understanding `create_individual_features.py`

Packages