This repository contains scripts to build Singularity containers for popular protein prediction models: AlphaFold 3, Boltz, and Chai-1. These containers are optimized for both ARM64 and x86 systems with NVIDIA GPUs and have been tested on such systems.
Key Features:
- Cross-Architecture Support: Provides definition files for both ARM64 (e.g., NVIDIA Grace Hopper) and x86_64 architectures.
- AlphaFold 3:
- ARM64 support introduced.
- x86_64 support.
- Model weights cannot be pre-downloaded due to Google's Terms of Service; users must download them separately.
- Requires local databases for full functionality (see "Database and Model Weights Download" section).
- Boltz:
- ARM64 support introduced.
- x86_64 support (pip installable, but container provides pre-configured environment).
- Container includes pre-downloaded model weights.
- Supports MSA generation via ColabFold API.
- Chai-1:
- ARM64 support introduced.
- x86_64 support.
- Container includes pre-downloaded model weights.
- Supports MSA generation via ColabFold API.
Important Note on MSA Generation and Databases:
- AlphaFold 3 natively requires local sequence and structure databases for its data pipeline. Users must download these databases separately (details below under "Database and Model Weights Download").
- Boltz and Chai-1 containers in this repository are configured to support MSA (Multiple Sequence Alignment) generation via the ColabFold API, offering an alternative to local database dependencies for this step.
- Access to a system with Singularity (or Apptainer) installed.
- Access to the internet to download dependencies and clone repositories.
- Access to a SLURM-managed cluster (if using the provided
build_container.slurm
script).
Each model has specific requirements for databases and model weights.
AlphaFold 3 requires large sequence and structure databases to function.
Requirement (Databases): You must download these databases separately. Recommendation (Databases): Use the official script provided by Google DeepMind:
- Clone the official AlphaFold 3 repository:
git clone https://github.com/google-deepmind/alphafold3.git cd alphafold3
- Run the download script (this requires
wget
andzstd
):Replace./fetch_databases.sh /path/to/your/database/storage
/path/to/your/database/storage
with the desired location.
Note: The databases require significant disk space (~252 GB download, ~630 GB uncompressed). An SSD is recommended for better performance.
Requirement (Model Weights): Access to the official AlphaFold3 model parameters requires registration for non-commercial use via the AlphaFold 3 Model Parameters Request Form. Ensure you meet the terms and download the weights to an accessible directory path.
The Boltz container comes with pre-downloaded model weights (ccd.pkl
and boltz1_conf.ckpt
) stored in /opt/boltz_cache
within the container. No separate download is required for the weights if using the container.
The Chai-1 container also includes pre-downloaded model weights, which are stored in /opt/chai_lab_downloads
within the container. No separate download is required for these weights when using the container.
This repository is structured as follows:
build_container.slurm
: SLURM batch script located in the root directory to build any of the Singularity containers. Requires user modification for cluster settings.alphafold3/
: Directory containing files specific to AlphaFold 3.alphafold3_arm.def
: Singularity definition file for ARM64 systems.alphafold3_x86.def
: Singularity definition file for x86 systems.run_alphafold3_launcher.py
: Python script for convenient execution of the AlphaFold 3 container.
boltz/
: Directory containing files specific to Boltz.boltz_arm.def
: Singularity definition file for ARM64 systems.boltz_x86.def
: Singularity definition file for x86 systems.run_boltz_launcher.py
: Python script for convenient execution of the Boltz container.
chai_1/
: Directory containing files specific to Chai-1.chai_lab_arm.def
: Singularity definition file for ARM64 systems (note thechai_lab
prefix).chai_lab_x86.def
: Singularity definition file for x86 systems (note thechai_lab
prefix).run_chailab_launcher.py
: Python script for convenient execution of the Chai-1 container.
Pre-built Singularity image files (.sif
) based on these definitions are available on Sylabs Cloud. You can pull them directly using the following commands:
Note: To verify the authenticity of the containers, please verify the key using:
singularity key import keys/mypublic.pem
# ARM64 architecture
singularity pull --arch arm64 library://vinaymatt/repo/alphafold3:arm64
# x86_64 architecture
singularity pull --arch amd64 library://vinaymatt/repo/alphafold3:amd64
# ARM64 architecture
singularity pull --arch arm64 library://boltzarm/repo/boltzarm:arm64
# x86_64 architecture
singularity pull --arch amd64 library://boltzx86/repo/boltzx86:latest
# ARM64 architecture
singularity pull --arch arm64 library://chai_labarm/repo/chai_lab:arm64
# x86_64 architecture
singularity pull --arch amd64 library://chaix86/repo/chai_lab:amd64
The build_container.slurm
script (located in the repository root) can be used to build any of the model containers on a SLURM-managed cluster.
-
Navigate to the Repository Root: Ensure your terminal is in the root directory of this repository where
build_container.slurm
is located. -
Configure SLURM Script: Before submitting, open
build_container.slurm
and ensure the following placeholder values are correctly set for your cluster environment:#SBATCH --partition=YOUR_PARTITION
: Set this to the appropriate SLURM partition/queue.#SBATCH --account=YOUR_ACCOUNT
: Set this to your SLURM allocation/account name.- (Optional) Review and adjust other SBATCH directives like job name, output/error file paths, time, memory, CPUs, and GPUs as needed. The script uses variables like
${MODEL_DIR_NAME}
and${ARCH}
in commented SBATCH lines for job name/output, but SLURM typically doesn't expand script variables in these directives directly. You might need to set them manually or use generic names.
-
Submit the SLURM Job: Use the
sbatch
command to submit the build job. You need to specify the model's directory name, target architecture, and the desired build directory.sbatch build_container.slurm <model_directory_name> <architecture> /path/to/your/build/directory
Arguments:
<model_directory_name>
: The name of the model's directory (e.g.,alphafold3
,boltz
,chai_1
). This tells the script where to find the definition file.<architecture>
: The target architecture (arm
orx86
)./path/to/your/build/directory
: The directory where the SIF file will be saved. This path can be absolute or relative to the repository root.
Example:
To build AlphaFold 3 for x86, outputting to
/scratch/user/af3_build
:sbatch build_container.slurm alphafold3 x86 /scratch/user/af3_build
⚠️ Important: The definition files may have CUDA versions hardcoded (e.g., AlphaFold 3 currently set to CUDA 12.6.0, Boltz ARM uses base Ubuntu 22.04 which is flexible but PyTorch is built with CUDA 12.8).
This might cause compatibility issues if your host system uses a different CUDA version. You have two options:
-
Modify the Definition File: Edit the
From:
line in the definition file to match your system's CUDA version:Bootstrap: docker From: nvidia/cuda:XX.X.X-runtime-ubuntu22.04
Replace
XX.X.X
with your system's CUDA version. -
Use Environment Variables: Override default settings when running the container to prevent CUDA errors.
- If you encounter CUDA errors, try updating your NVIDIA drivers to the latest version
- For memory errors, decrease the
XLA_CLIENT_MEM_FRACTION
value (e.g., 0.75 or 0.5) - Some systems may need to use
apptainer
instead ofsingularity
command - If you continue experiencing CUDA version mismatches, rebuilding the container with the matching CUDA version is recommended
Refer to the %help
section within each definition file for architecture-specific instructions on running the built containers.
General Execution (Example):
singularity exec --nv /path/to/your/<model_name>_<arch>.sif <command_specific_to_model> <arguments>
The alphafold3/run_alphafold3_launcher.py
script provides a convenient way to run predictions using the AlphaFold 3 Singularity container.
-
Prerequisites:
- Python 3.x
spython
andabsl-py
Python libraries:pip install spython absl-py
- A built Singularity container for your architecture (e.g.,
alphafold3_arm.sif
oralphafold3_x86.sif
), typically located in your specified build directory. - Downloaded AlphaFold 3 model parameters.
- Downloaded databases (see Database Download section).
-
Configuration:
- Update the
_ALPHAFOLD3_SIF_PATH
variable insidealphafold3/run_alphafold3_launcher.py
to point to your appropriate architecture-specific SIF file, OR set theALPHAFOLD3_SIF
environment variable.
- Update the
-
Execution:
python alphafold3/run_alphafold3_launcher.py \ --json_path=/path/to/input.json \ --model_dir=/path/to/model_params \ --db_dir=/path/to/databases \ --output_dir=/path/to/output \ [--other-flags...]
Replace the example paths with your actual paths.
Key Flags:
--json_path
: Path to a single input JSON file.--input_dir
: Path to a directory of input JSON files (alternative to--json_path
).--model_dir
: Path to the downloaded AlphaFold 3 model parameters.--db_dir
: Path(s) to the downloaded databases (can be specified multiple times).--output_dir
: Directory where results will be saved.--use_gpu
: Set tofalse
to run without GPU (only data pipeline).--run_data_pipeline=false
: Skip the data pipeline step.--run_inference=false
: Skip the inference step.- Run
python alphafold3/run_alphafold3_launcher.py --help
to see all available options.
The boltz/run_boltz_launcher.py
script provides a convenient way to run predictions using the Boltz Singularity container.
-
Prerequisites:
- Python 3.x
spython
andabsl-py
Python libraries:pip install spython absl-py
- A built Singularity container for your architecture (e.g.,
boltz_arm.sif
orboltz_x86.sif
), typically located in your specified build directory.
-
Configuration:
- Update the
_BOLTZ_SIF_PATH
variable insideboltz/run_boltz_launcher.py
to point to your appropriate architecture-specific SIF file, OR set theBOLTZ_SIF
environment variable.
- Update the
-
Execution:
python boltz/run_boltz_launcher.py \ --input_data=/path/to/your/input.fasta \ --out_dir=/path/to/output \ [--other-flags...]
Replace the example paths with your actual paths.
Key Flags:
--sif_path
: Path to the Boltz Singularity image SIF file (overrides environment variable and hardcoded path).--input_data
: (Required) Input data file or directory (FASTA/YAML).--out_dir
: Output directory for predictions.--boltz_cache_dir
: Directory for Boltz to download data/models (defaults to$BOLTZ_CACHE
or~/.boltz
).--checkpoint
: Optional path to a model checkpoint file.--use_gpu
: Enable NVIDIA runtime for GPU usage (default: True).--gpu_devices
: Comma-separated list of GPU devices forNVIDIA_VISIBLE_DEVICES
.- Refer to the script's help for more specific Boltz arguments like
--recycling_steps
,--sampling_steps
,--use_msa_server
, etc. - Run
python boltz/run_boltz_launcher.py --help
to see all available options.
The chai_1/run_chailab_launcher.py
script offers a user-friendly way to execute predictions with the Chai-1 Singularity container.
-
Prerequisites:
- Python 3.x
spython
andabsl-py
Python libraries:pip install spython absl-py
- A built Singularity container for your architecture (e.g.,
chai_lab_arm.sif
orchai_lab_x86.sif
), typically located in your specified build directory.
-
Configuration:
- Update the
_CHAI_SIF_PATH
variable insidechai_1/run_chailab_launcher.py
to point to your SIF file, OR set theCHAI_SIF
environment variable.
- Update the
-
Execution:
python chai_1/run_chailab_launcher.py \ --fasta_file=/path/to/your/sequence.fasta \ --output_dir=/path/to/chai_output \ [--other-flags...]
Replace the example paths with your actual paths.
Key Flags:
--sif_path
: Path to the Chai Lab Singularity image SIF file (overrides environment variable and hardcoded path).--fasta_file
: (Required) Path to the input FASTA file.--output_dir
: Path to a directory for storing results.--force_output_dir
: If True, use the exact output directory even if non-empty.--use_gpu
: Enable NVIDIA runtime for GPU usage (default: True).--gpu_devices
: Comma-separated list of GPU devices forNVIDIA_VISIBLE_DEVICES
.- Refer to the script's help for more specific Chai Lab arguments like
--msa_directory
,--constraint_path
,--num_diffn_samples
, etc. - Run
python chai_1/run_chailab_launcher.py --help
to see all available options.
- AlphaFold by DeepMind Technologies Limited
- Boltz-1 by Wohlwend, Jeremy, et al. "Boltz-1: Democratizing Biomolecular Interaction Modeling." bioRxiv (2024): 2024-11.
- Chai 1 by Chai Discovery, Inc.
- The research project is generously funded by Cornell University BRC Epigenomics Core Facility (RRID:SCR_021287), Penn State Institute for Computational and Data Sciences (RRID:SCR_025154) , Penn State University Center for Applications of Artificial Intelligence and Machine Learning to Industry Core Facility (AIMI) (RRID:SCR_022867) and supported by a gift to AIMI research from Dell Technologies.
- Computational support was provided by NSF ACCESS to William KM Lai and Gretta Kellogg through BIO230041