Skip to content

tsudalab/HDXRank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HDXRank DOI

HDXRank is an deep learning pipeline that applies HDX-MS (Hydrogen-Deuterium Exchange Mass Spectrometry) restraints to rank protein-protein complex predictions.

Overview

HDXRank addresses the challenge of selecting accurate protein complex models by integrating experimental HDX-MS data with graph-based deep learning. The method uses HDX restraints to evaluate how well predicted complex structures align with experimental binding interface data, providing a robust framework for complex model ranking with improved prediction accuracy.

Key Features

  • HDX-MS data integration for experimental restraints
  • Support for multiple input sources (docking predictions, AlphaFold models)
  • Flexible and extensible framework for incorporating new experimental data

Installation

HDXRank requires Python with CUDA 11.8 support. We provide both Docker and Conda installation options.

Prerequisites

  • Docker (recommended) or Conda
  • CUDA 11.8 compatible GPU (for model training/prediction)

Quick Start with Docker (Recommended)

  1. Clone the repository:
git clone https://github.com/SuperChrisW/HDXRank.git
cd HDXRank
  1. Run with Docker:
docker pull superchrisw/hdxrank:latest
docker run -it --rm -v $(pwd):/job/code superchrisw/hdxrank:latest /bin/bash
cd /job/code
python main.py --help

Alternative: Conda Installation

chmod +x ./install.sh
./install.sh
conda activate HDXRank
python main.py --help

Required Input Files

HDXRank requires four main types of input files:

  1. Protein Structure Files (.pdb) - Complex structure predictions to be ranked + apo structures
  2. Multiple Sequence Alignments (.hhm) - Generated using HHblits against UniRef30
  3. HDX-MS Data (.xlsx) - Experimental HDX data with specific column format
  4. Configuration File (.yaml) - Pipeline settings and parameters

Preparing MSA Files

HDXRank requires .hhm format multiple sequence alignments generated using HHblits:

Install HHblits

conda create -n hhblits -y
conda activate hhblits
conda install hhsuite -c conda-forge -c bioconda -y

Download UniRef30 Database

mkdir -p databases
cd databases
wget http://wwwuser.gwdg.de/~compbiol/uniclust/2020_06/UniRef30_2020_06_hhsuite.tar.gz
tar -xvfz UniRef30_2020_06_hhsuite.tar.gz
rm UniRef30_2020_06_hhsuite.tar.gz
cd ..

Generate .hhm Files

bash ./scripts/hhblits.sh

This processes all .fasta files in /HDXRank/fasta_files/ and saves .hhm files to /HDXRank/hhm_files/

HDX-MS Data Format

Your Excel file should contain the following columns:

  • protein - Protein identifier
  • state - Experimental state (apo/complex)
  • start - Peptide start position
  • end - Peptide end position
  • sequence - Peptide sequence
  • log_t - Log exchange time
  • RFU - Relative fractional uptake

Usage

Configuration Setup

HDXRank uses YAML configuration files to define all pipeline parameters. See configs/config.template.yaml for a complete template.

Key Configuration Sections:

GeneralParameters: File paths and execution mode

TaskParameters: Control protein embedding and graph construction

PredictionParameters: Model prediction settings

ScorerParameters: Scoring and ranking settings

Running HDXRank

Basic Usage

python main.py --config path/to/config.yaml

Output Files

Results are saved to the specified output directory:

  • HDX_scores.csv - Ranked structures with HDXRank scores
  • predictions/ - Raw RFU predictions for each structure
  • results/scores/ - Detailed scoring analysis and plots

Example Data

Download example datasets and configurations:

# HDX-MS dataset for training/validation
wget -O dataset.zip https://zenodo.org/records/15426072/files/dataset.zip?download=1
unzip dataset.zip

# Example structures and configurations
wget -O example.zip https://zenodo.org/records/15426072/files/example.zip?download=1
unzip example.zip

rm dataset.zip example.zip

Users can repeat rigid docking by using HDock program in prog.tar.gz.

Model Training

Preparing Training Data

  1. Add new HDX-MS files to dataset/HDX_files/
  2. Update the dataset record in dataset/250110_HDXRank_dataset.xlsx
  3. Generate embeddings and graphs:
    python main.py --config ./configs/config_retrain_HDXRank.yaml

Training the Model

python ./hdxrank/HDXRank_train.py --config ./configs/config_retrain_HDXRank.yaml

Citation

If you use HDXRank in your research, please cite:

@article{Wang2025HDXRank,
  author    = {Liyao Wang and Andrejs Tucš and Songting Ding and Koji Tsuda and Adnan Sljoka},
  title     = {HDXRank: A Deep Learning Framework for Ranking Protein Complex Predictions With Hydrogen–Deuterium Exchange Data},
  journal   = {Journal of Chemical Theory and Computation},
  year      = {2025},
  volume    = {21},
  number    = {14},
  pages     = {7173--7187},
  doi       = {10.1021/acs.jctc.5c00175}
}

Support

For questions, bug reports, or feature requests, please open an issue on GitHub

About

applies HDX-MS restraints to rank protein-protein complex predictions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published