Skip to content

The official repository of our paper "Steering Semi-flexible Molecular Diffusion Model for Structure-Based Drug Design with Reinforcement Learning"

Notifications You must be signed in to change notification settings

ispc-lab/SeFMol

Repository files navigation

SeFMol: Steering Semi-flexible Molecular Diffusion Model for Structure-Based Drug Design with Reinforcement Learning

Official repository for the paper "Steering Semi-flexible Molecular Diffusion Model for Structure-Based Drug Design with Reinforcement Learning".

Platform Visualization

Key Features

  • 🧠 Two-Stage Rigid Training: Combines property-biased pretraining on Molecule3D dataset with target-aware fine-tuning on protein-ligand pairs

  • 🤖 RL-Optimized Semi-Flexibility: Models denoising as Markov decision process with KL-constrained policy network for semi-flexible conformation exploration

  • 20x Faster Sampling: Revolutionary fast training-free sampling strategy reducing steps to 1/20th of conventional diffusion models

  • 📊 Sparse Reward Solution: Addresses sparse affinity signals through property-conditioned reinforcement learning

  • 💻 User-friendly Platform: Integrated visualization interface

Performance Comparison

SeFMol outperforms existing methods on key molecular metrics:

Method Vina Score (↓) Vina Min (↓) Vina Dock (↓) High Affinity (↑) QED (↑) SA (↑) Lipinski (↑) Diversity (↑)
Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg.
Reference -6.36 -6.46 -6.71 -6.49 -7.45 -7.26 - - 0.48 0.47 0.73 0.74 4.27 -
AR -5.75 -5.64 -6.18 -5.88 -6.75 -6.62 37.9% 31.0% 0.51 0.50 0.63 0.63 4.75 0.690
Pocket2Mol -5.14 -4.70 -6.42 -5.82 -7.15 -6.79 48.4% 51.0% 0.56 0.57 0.74 0.75 4.88 0.685
ResGen 10.50 2.54 -2.94 -4.41 -6.59 -6.45 38.0% 25.0% 0.58 0.59 0.78 0.79 4.90 0.742
FLAG 45.98 36.62 6.17 -2.91 -5.24 -5.71 27.9% 5.0% 0.61 0.62 0.63 0.62 4.98 0.766
TargetDiff -5.47 -6.30 -6.64 -6.83 -7.80 -7.91 58.1% 59.1% 0.48 0.48 0.58 0.58 4.51 0.708
DecompDiff -5.67 -6.04 -7.04 -6.91 -8.39 -8.43 64.4% 71.0% 0.45 0.43 0.61 0.60 4.31 0.660
MolCRAFT -6.59 -7.04 -7.27 -7.26 -7.92 -8.01 59.1% 62.6% 0.50 0.51 0.69 0.68 4.46 0.718
IPDiff -6.66 -7.47 -7.64 -7.69 -8.49 -8.39 68.5% 72.2% 0.50 0.51 0.56 0.56 4.40 0.728
SeFMol -7.23 -7.70 -8.03 -8.00 -8.72 -8.75 68.7% 76.3% 0.63 0.64 0.60 0.60 4.90 0.686

Platform Preview

We're developing a comprehensive platform for molecular design and visualization. The complete platform will be released upon paper acceptance.

SeFMol Platform Preview


SeFMol Platform Preview


SeFMol Platform Preview

Installation

Prerequisites

  • Conda package manager
  • NVIDIA GPU (recommended)

Create Environment

conda create -n SeFMol python=3.9
conda activate SeFMol

Install Dependencies

# Install PyTorch with CUDA 11.7
conda install pytorch==1.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

# Install molecular modeling dependencies
conda install -c conda-forge pdbfixer
conda install conda-forge::openbabel
conda install pyyaml easydict python-lmdb -c conda-forge

# Install Python packages
pip install protobuf==5.27.1
pip install networkx==3.2.1
pip install rdkit==2023.9.6
pip install biopython==1.83

# For Vina Docking
pip install meeko==0.1.dev3 scipy pdb2pqr vina==1.2.2 
python -m pip install git+https://github.com/Valdes-Tresanco-MS/AutoDockTools_py3

# For EDeN
pip install git+https://github.com/fabriziocosta/EDeN.git --user

Data Preparation

Download required datasets from Google Drive folder:

For training:

  • crossdocked_v1.1_rmsd1.0_pocket10_processed_final.lmdb
  • crossdocked_pocket10_pose_split.pt

For evaluation:

  • test_set.zip (unzip before use)

Training

1. Rigid Pre-training

python train_rigid_pt.py

2. Rigid Fine-tuning

python train_rigid_ft.py

3. Semi-flexible Training

python train_sfrl.py

Sampling

python sample.py \
  --config configs/rl.yml \
  --start_index 0 \
  --end_index 99 \
  --timesteps 50 

--timesteps Argument

Property Value
Range 10 to 1000 (controls diffusion steps)
Recommendation 50 (optimal speed/quality balance)
Performance 20x faster than default (1000 steps)
No detectable quality loss

Evaluation

Evaluate generated molecules:

python eval_split_diff.py

Coming Soon

  • Complete visualization platform
  • Pre-trained model weights
  • Tutorial notebooks
  • Docker image for easy deployment

Citation

Our paper is under review, if you find our code helpful, please cite

@misc{SeFMol2025,
title = {Steering Semi-flexible Molecular Diffusion Model for Structure-Based Drug Design with Reinforcement Learning},
author = {Zhang, Xudong and Qu, Sanqing and Lu, Fan and Wang, Jianmin and Tian, Zhixin and Gu, Shangding and Zhang, Yanping and Knoll, Alois and Gao, Shaorong and Chen, Guang and Jiang, Changjun},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ispc-lab/SeFMol}},
}

About

The official repository of our paper "Steering Semi-flexible Molecular Diffusion Model for Structure-Based Drug Design with Reinforcement Learning"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages