This project provides a web-based interface for running protein structure prediction jobs using AlphaFold 2 and AlphaFold 3 on the ICDS Roar cluster via Open OnDemand v3. The app simplifies the process of submitting and monitoring AlphaFold jobs by providing a user-friendly interface and automated job management.
- Leveraging AlphaFold in Graduate Research
- OSC News: Inaugural GOOD Conference Draws Strong Attendance from 10 Countries
Presented as a talk at the Global Open On Demand Conference 2025, Harvard University
Date: March 19, 2025, 4:00 PM – 4:25 PM (25 min)
Title: AlphaFold accessibility: an optimized open-source OOD app for Protein Structure Prediction
Speakers: Vinay Saji Mathew [Pennsylvania State University] , William Lai [Cornell], Matt Hansen [Pennsylvania State University]Track: Application Track [featuring AI OnDemand]
Location: Tsai Auditorium (CGIS S010)
-
AlphaFold 2:
- Supports AlphaFold v2.3.2 for protein structure prediction
- Handles both monomer and multimer predictions
- Uses full database configuration for maximum accuracy
- Automated MSA generation and template search
-
AlphaFold 3 (New!):
- Latest version of AlphaFold with improved accuracy
- Supports protein-protein, protein-DNA/RNA, and protein-ligand complexes
- Enhanced diffusion-based structure prediction
- Requires acceptance of Google's terms of service
- Two-phase execution:
- CPU phase for MSA/templates
- GPU phase for prediction (set as a dependency)
- Real-time job status monitoring
- Detailed progress tracking
- Automatic error handling and recovery
- Flexible Input Formats:
- FASTA sequence input for AlphaFold 2
- JSON format input for AlphaFold 3 (following official specifications)
- GPU allocation selection
- Working directory customization
- Real-time progress visualization
- Direct access to output files
-
AlphaFold 2:
- PDB structure files (ranked by confidence)
- Multiple Sequence Alignment (MSA) files
- Detailed prediction metrics and confidence scores
- Comprehensive log files
-
AlphaFold 3:
- CIF structure files
- Ranking scores for multiple predictions
- Detailed model outputs and metrics
- Complete execution logs
Both AlphaFold versions require genetic databases that must be set up before using the app:
- AlphaFold 2: Download using script from AlphaFold 2 repository
- AlphaFold 3: Additional databases required. Setup instructions available here
The app uses Singularity containers for execution:
- AlphaFold 2: Download from Sylabs
- AlphaFold 3: Requires official container from Google (subject to terms of use). Weights needed for running AlphaFold 3 have to be requested from Google here
- Clone this repository into your Open OnDemand apps directory
- Configure paths in
template/alphafold_env.sh
- Ensure all required databases are properly set up
- Verify GPU compute capabilities.
- Access the Open OnDemand dashboard
- Navigate to "Interactive Apps"
- Select "Protein Structure Prediction"
- Choose prediction engine (AlphaFold 2 or 3)
- Fill out the form:
- For AlphaFold 2: Enter protein sequence in FASTA format
- For AlphaFold 3: Provide input in JSON format
- Select GPU allocation
- Choose working directory
- Accept terms of service (required for AlphaFold 3)
- Submit the job
The app accepts protein sequences in FASTA format.
Example:
>sequence_name
MVKVGVNGFGRIGRLVTRAAFNSGKVDIVAINDPFIDLNYMVYMFQYDSTHGKFHGTVKA
ENGKLVINGNPITIFQERDPSKIKWGDAGAEYVVESTGVFTTMEKAGAHLQGGAKRVIIS
{
"name": "example_complex",
"sequences": [
{
"protein": {
"id": "protein_chain_A",
"sequence": "MVKVGVNG..."
}
}
],
"modelSeeds": [1, 2, 3]
}
The app generates the following output structure:
working_directory/
└── run_YYYYMMDD_HHMMSS/
├── input/
│ ├── [structure files] # Predicted structures
│ ├── [prediction data] # Detailed predictions
│ └── msas/ # Multiple sequence alignments
├── logs/ # Job logs
├── CPU-SLURM/ # CPU phase files
└── GPU-SLURM/ # GPU phase files
The app provides real-time monitoring of:
- MSA generation progress
- Template search status
- Structure prediction progress
- Model relaxation status
Common issues and solutions:
-
Job fails in CPU phase:
- Check available disk space
- Verify database paths
- Examine CPU phase logs
-
GPU phase errors:
- Verify GPU allocation
- Check memory requirements
- Review GPU phase logs
- For AlphaFold 3: Ensure GPU compute availability.
This project is licensed under the MIT License.
- AlphaFold by DeepMind Technologies Limited
- Singularity container by prehensilecode
- The research project is generously funded by Cornell University BRC Epigenomics Core Facility (RRID:SCR_021287), Penn State Institute for Computational and Data Sciences (RRID:SCR_025154) , Penn State University Center for Applications of Artificial Intelligence and Machine Learning to Industry Core Facility (AIMI) (RRID:SCR_022867) and supported by a gift to AIMI research from Dell Technologies.
- Computational support was provided by NSF ACCESS to William KM Lai and Gretta Kellogg through BIO230041
For questions or issues, please contact:
- Technical support: vinaysmathew@psu.edu
- ICDS support: icds@psu.edu