SeFMol: Steering Semi-flexible Molecular Diffusion Model for Structure-Based Drug Design with Reinforcement Learning
Official repository for the paper "Steering Semi-flexible Molecular Diffusion Model for Structure-Based Drug Design with Reinforcement Learning".
-
🧠 Two-Stage Rigid Training: Combines property-biased pretraining on Molecule3D dataset with target-aware fine-tuning on protein-ligand pairs
-
🤖 RL-Optimized Semi-Flexibility: Models denoising as Markov decision process with KL-constrained policy network for semi-flexible conformation exploration
-
⏩ 20x Faster Sampling: Revolutionary fast training-free sampling strategy reducing steps to 1/20th of conventional diffusion models
-
📊 Sparse Reward Solution: Addresses sparse affinity signals through property-conditioned reinforcement learning
-
💻 User-friendly Platform: Integrated visualization interface
SeFMol outperforms existing methods on key molecular metrics:
Method | Vina Score (↓) | Vina Min (↓) | Vina Dock (↓) | High Affinity (↑) | QED (↑) | SA (↑) | Lipinski (↑) | Diversity (↑) | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Avg. | Med. | Avg. | Med. | Avg. | Med. | Avg. | Med. | Avg. | Med. | Avg. | Med. | Avg. | ||
Reference | -6.36 | -6.46 | -6.71 | -6.49 | -7.45 | -7.26 | - | - | 0.48 | 0.47 | 0.73 | 0.74 | 4.27 | - |
AR | -5.75 | -5.64 | -6.18 | -5.88 | -6.75 | -6.62 | 37.9% | 31.0% | 0.51 | 0.50 | 0.63 | 0.63 | 4.75 | 0.690 |
Pocket2Mol | -5.14 | -4.70 | -6.42 | -5.82 | -7.15 | -6.79 | 48.4% | 51.0% | 0.56 | 0.57 | 0.74 | 0.75 | 4.88 | 0.685 |
ResGen | 10.50 | 2.54 | -2.94 | -4.41 | -6.59 | -6.45 | 38.0% | 25.0% | 0.58 | 0.59 | 0.78 | 0.79 | 4.90 | 0.742 |
FLAG | 45.98 | 36.62 | 6.17 | -2.91 | -5.24 | -5.71 | 27.9% | 5.0% | 0.61 | 0.62 | 0.63 | 0.62 | 4.98 | 0.766 |
TargetDiff | -5.47 | -6.30 | -6.64 | -6.83 | -7.80 | -7.91 | 58.1% | 59.1% | 0.48 | 0.48 | 0.58 | 0.58 | 4.51 | 0.708 |
DecompDiff | -5.67 | -6.04 | -7.04 | -6.91 | -8.39 | -8.43 | 64.4% | 71.0% | 0.45 | 0.43 | 0.61 | 0.60 | 4.31 | 0.660 |
MolCRAFT | -6.59 | -7.04 | -7.27 | -7.26 | -7.92 | -8.01 | 59.1% | 62.6% | 0.50 | 0.51 | 0.69 | 0.68 | 4.46 | 0.718 |
IPDiff | -6.66 | -7.47 | -7.64 | -7.69 | -8.49 | -8.39 | 68.5% | 72.2% | 0.50 | 0.51 | 0.56 | 0.56 | 4.40 | 0.728 |
SeFMol | -7.23 | -7.70 | -8.03 | -8.00 | -8.72 | -8.75 | 68.7% | 76.3% | 0.63 | 0.64 | 0.60 | 0.60 | 4.90 | 0.686 |
We're developing a comprehensive platform for molecular design and visualization. The complete platform will be released upon paper acceptance.
- Conda package manager
- NVIDIA GPU (recommended)
conda create -n SeFMol python=3.9
conda activate SeFMol
# Install PyTorch with CUDA 11.7
conda install pytorch==1.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
# Install molecular modeling dependencies
conda install -c conda-forge pdbfixer
conda install conda-forge::openbabel
conda install pyyaml easydict python-lmdb -c conda-forge
# Install Python packages
pip install protobuf==5.27.1
pip install networkx==3.2.1
pip install rdkit==2023.9.6
pip install biopython==1.83
# For Vina Docking
pip install meeko==0.1.dev3 scipy pdb2pqr vina==1.2.2
python -m pip install git+https://github.com/Valdes-Tresanco-MS/AutoDockTools_py3
# For EDeN
pip install git+https://github.com/fabriziocosta/EDeN.git --user
Download required datasets from Google Drive folder:
For training:
crossdocked_v1.1_rmsd1.0_pocket10_processed_final.lmdb
crossdocked_pocket10_pose_split.pt
For evaluation:
test_set.zip
(unzip before use)
python train_rigid_pt.py
python train_rigid_ft.py
python train_sfrl.py
python sample.py \
--config configs/rl.yml \
--start_index 0 \
--end_index 99 \
--timesteps 50
Property | Value |
---|---|
Range | 10 to 1000 (controls diffusion steps) |
Recommendation | 50 (optimal speed/quality balance) |
Performance | ⚡ 20x faster than default (1000 steps) ✅ No detectable quality loss |
Evaluate generated molecules:
python eval_split_diff.py
- Complete visualization platform
- Pre-trained model weights
- Tutorial notebooks
- Docker image for easy deployment
Our paper is under review, if you find our code helpful, please cite
@misc{SeFMol2025,
title = {Steering Semi-flexible Molecular Diffusion Model for Structure-Based Drug Design with Reinforcement Learning},
author = {Zhang, Xudong and Qu, Sanqing and Lu, Fan and Wang, Jianmin and Tian, Zhixin and Gu, Shangding and Zhang, Yanping and Knoll, Alois and Gao, Shaorong and Chen, Guang and Jiang, Changjun},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ispc-lab/SeFMol}},
}