Skip to content

eightmm/BAPred

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

28 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿงฌ BAPred

Protein-Ligand Binding Affinity Prediction using Graph Neural Networks

PyPI version Python PyTorch DGL License CASP16 GitHub stars

High-performance protein-ligand binding affinity prediction model - 2nd place in CASP16 ligand affinity challenge

๐ŸŒŸ Features

  • ๐Ÿ† CASP16: 2nd place in the prestigious CASP16 ligand affinity prediction challenge
  • ๐ŸŽฏ High Accuracy: Graph Neural Network-based architecture for precise binding affinity prediction
  • ๐Ÿ”ฌ Research Ready: Pre-trained models ready for immediate use
  • ๐Ÿ› ๏ธ Easy Integration: Simple Python API and command-line interface
  • ๐Ÿ“ˆ Scalable: Batch processing for high-throughput screening

๐Ÿš€ Quick Start

Installation

Choose your preferred installation method:

๐Ÿ“ฆ Option 1: Install from PyPI (Simplest)
pip install bapred
๐Ÿ Option 2: Using Conda (Recommended for Development)
git clone https://github.com/eightmm/BAPred.git
cd BAPred
conda env create -f env.yaml
conda activate BAPred
pip install -e .
๐Ÿ”ง Option 3: From Source
git clone https://github.com/eightmm/BAPred.git
cd BAPred
pip install -r requirements.txt
pip install -e .

๐Ÿƒโ€โ™‚๏ธ Run Your First Prediction

python run_inference.py -r example/1KLT.pdb -l example/ligands.sdf -o results.csv

That's it! ๐ŸŽ‰ Your binding affinity predictions will be saved in results.csv.

๐Ÿ“‹ Usage Examples

Basic Usage

# Predict binding affinities
python run_inference.py -r example/1KLT.pdb -l example/ligands.sdf -o results.csv

Advanced Options

# Use CPU instead of GPU
python run_inference.py -r protein.pdb -l ligands.sdf -o results.csv --device cpu

# Custom batch size for memory optimization
python run_inference.py -r protein.pdb -l ligands.sdf -o results.csv --batch_size 64

# Limit CPU workers for data loading
python run_inference.py -r protein.pdb -l ligands.sdf -o results.csv --ncpu 8

# Specify custom model path
python run_inference.py -r protein.pdb -l ligands.sdf -o results.csv --model_path /path/to/model

Python API

from bapred.inference import inference

# Run prediction programmatically
inference(
    protein_pdb="example/1KLT.pdb",
    ligand_file="example/ligands.sdf",
    output="results.csv",
    batch_size=128,
    ncpu=4,
    model_path="bapred/weight",
    device="cuda"
)

๐Ÿ“ Project Structure

BAPred/
โ”œโ”€โ”€ ๐Ÿ“ฆ bapred/                 # Main package
โ”‚   โ”œโ”€โ”€ ๐Ÿงช data/               # Data processing modules
โ”‚   โ”‚   โ”œโ”€โ”€ atom_feature.py    # Atomic feature extraction
โ”‚   โ”‚   โ”œโ”€โ”€ data.py           # Dataset handling
โ”‚   โ”‚   โ””โ”€โ”€ utils.py          # Utility functions
โ”‚   โ”œโ”€โ”€ ๐Ÿง  model/              # Neural network models
โ”‚   โ”‚   โ”œโ”€โ”€ GatedGCNLSPE.py   # Gated Graph Convolution
โ”‚   โ”‚   โ”œโ”€โ”€ GraphGPS.py       # Graph GPS architecture
โ”‚   โ”‚   โ”œโ”€โ”€ MHA.py            # Multi-Head Attention
โ”‚   โ”‚   โ””โ”€โ”€ model.py          # Main model wrapper
โ”‚   โ”œโ”€โ”€ โš–๏ธ weight/             # Pre-trained weights
โ”‚   โ”‚   โ””โ”€โ”€ BAPred.pth        # Model checkpoint
โ”‚   โ””โ”€โ”€ ๐Ÿ”ฎ inference.py       # Inference engine
โ”œโ”€โ”€ ๐Ÿ“ example/               # Example files
โ”‚   โ”œโ”€โ”€ 1KLT.pdb             # Sample protein structure
โ”‚   โ””โ”€โ”€ ligands.sdf          # Sample ligand library
โ”œโ”€โ”€ ๐Ÿš€ run_inference.py      # Easy-to-use script
โ”œโ”€โ”€ ๐Ÿ“‹ requirements.txt      # Python dependencies
โ”œโ”€โ”€ ๐Ÿ env.yaml             # Conda environment
โ””โ”€โ”€ ๐Ÿ“– README.md            # You are here!

๐ŸŽฏ Model Architecture

BAPred leverages cutting-edge graph neural network architectures:

  • ๐Ÿ”— Graph Convolution: Gated GCN with Laplacian Positional Encoding
  • ๐ŸŒ Graph GPS: Global attention mechanism for long-range interactions
  • ๐ŸŽญ Multi-Head Attention: Enhanced feature representation
  • ๐Ÿ”„ Complex Interactions: Protein-ligand interaction modeling

๐Ÿ“Š Input/Output Formats

Input

  • Protein: PDB format (.pdb)
  • Ligands: SDF (.sdf), MOL2 (.mol2), or text file with paths (.txt)

Output

  • CSV/TSV file with columns:
    • Name: Ligand identifier
    • pKd: Predicted binding affinity (pKd scale)
    • Kcal/mol: Binding energy in kcal/mol

๐Ÿ› ๏ธ System Requirements

  • Python: 3.11 or higher
  • Memory: 4GB RAM minimum (8GB+ recommended)
  • GPU: CUDA-compatible GPU (optional, but recommended for speed)
  • Storage: 2GB free space

๐Ÿ“Š Performance

Dataset Ligands Processing Time Performance
CASP16 Challenge dataset Competition ๐Ÿฅˆ 2nd Place
Example 500 ~3 minutes High precision
Custom Variable Scales linearly Research-grade

๐Ÿ† CASP16 Achievement

BAPred achieved 2nd place in the CASP16 (Critical Assessment of protein Structure Prediction) ligand affinity prediction challenge, demonstrating its state-of-the-art performance in real-world protein-ligand binding affinity prediction tasks.

๐Ÿค Contributing

We welcome contributions! Please feel free to:

  • ๐Ÿ› Report bugs
  • ๐Ÿ’ก Suggest features
  • ๐Ÿ“– Improve documentation
  • ๐Ÿ”ง Submit pull requests

๐Ÿ“„ License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

๐Ÿ“š Citation

If you use BAPred in your research, please cite:

@software{bapred2024,
  title={BAPred: Protein-Ligand Binding Affinity Prediction using Graph Neural Networks},
  author={Jaemin Sim},
  year={2024},
  url={https://github.com/eightmm/BAPred}
}

๐Ÿ™‹โ€โ™€๏ธ Support


Made with โค๏ธ for the scientific community

โญ Star us on GitHub if this project helped you!

About

BAPred

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •