SeFMol: Steering Semi-flexible Molecular Diffusion Model for Structure-Based Drug Design with Reinforcement Learning

Official repository for the paper "Steering Semi-flexible Molecular Diffusion Model for Structure-Based Drug Design with Reinforcement Learning".

Key Features

🧠 Two-Stage Rigid Training: Combines property-biased pretraining on Molecule3D dataset with target-aware fine-tuning on protein-ligand pairs
🤖 RL-Optimized Semi-Flexibility: Models denoising as Markov decision process with KL-constrained policy network for semi-flexible conformation exploration
⏩ 20x Faster Sampling: Revolutionary fast training-free sampling strategy reducing steps to 1/20th of conventional diffusion models
📊 Sparse Reward Solution: Addresses sparse affinity signals through property-conditioned reinforcement learning
💻 User-friendly Platform: Integrated visualization interface

Performance Comparison

SeFMol outperforms existing methods on key molecular metrics:

Method	Vina Score (↓)		Vina Min (↓)		Vina Dock (↓)		High Affinity (↑)		QED (↑)		SA (↑)		Lipinski (↑)	Diversity (↑)
	Avg.	Med.	Avg.	Med.	Avg.	Med.	Avg.	Med.	Avg.	Med.	Avg.	Med.	Avg.
Reference	-6.36	-6.46	-6.71	-6.49	-7.45	-7.26	-	-	0.48	0.47	0.73	0.74	4.27	-
AR	-5.75	-5.64	-6.18	-5.88	-6.75	-6.62	37.9%	31.0%	0.51	0.50	0.63	0.63	4.75	0.690
Pocket2Mol	-5.14	-4.70	-6.42	-5.82	-7.15	-6.79	48.4%	51.0%	0.56	0.57	0.74	0.75	4.88	0.685
ResGen	10.50	2.54	-2.94	-4.41	-6.59	-6.45	38.0%	25.0%	0.58	0.59	0.78	0.79	4.90	0.742
FLAG	45.98	36.62	6.17	-2.91	-5.24	-5.71	27.9%	5.0%	0.61	0.62	0.63	0.62	4.98	0.766
TargetDiff	-5.47	-6.30	-6.64	-6.83	-7.80	-7.91	58.1%	59.1%	0.48	0.48	0.58	0.58	4.51	0.708
DecompDiff	-5.67	-6.04	-7.04	-6.91	-8.39	-8.43	64.4%	71.0%	0.45	0.43	0.61	0.60	4.31	0.660
MolCRAFT	-6.59	-7.04	-7.27	-7.26	-7.92	-8.01	59.1%	62.6%	0.50	0.51	0.69	0.68	4.46	0.718
IPDiff	-6.66	-7.47	-7.64	-7.69	-8.49	-8.39	68.5%	72.2%	0.50	0.51	0.56	0.56	4.40	0.728
SeFMol	-7.23	-7.70	-8.03	-8.00	-8.72	-8.75	68.7%	76.3%	0.63	0.64	0.60	0.60	4.90	0.686

Platform Preview

We're developing a comprehensive platform for molecular design and visualization. The complete platform will be released upon paper acceptance.

Installation

Prerequisites

Conda package manager
NVIDIA GPU (recommended)

Create Environment

conda create -n SeFMol python=3.9
conda activate SeFMol

Install Dependencies

# Install PyTorch with CUDA 11.7
conda install pytorch==1.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

# Install molecular modeling dependencies
conda install -c conda-forge pdbfixer
conda install conda-forge::openbabel
conda install pyyaml easydict python-lmdb -c conda-forge

# Install Python packages
pip install protobuf==5.27.1
pip install networkx==3.2.1
pip install rdkit==2023.9.6
pip install biopython==1.83

# For Vina Docking
pip install meeko==0.1.dev3 scipy pdb2pqr vina==1.2.2 
python -m pip install git+https://github.com/Valdes-Tresanco-MS/AutoDockTools_py3

# For EDeN
pip install git+https://github.com/fabriziocosta/EDeN.git --user

Data Preparation

Download required datasets from Google Drive folder:

For training:

crossdocked_v1.1_rmsd1.0_pocket10_processed_final.lmdb
crossdocked_pocket10_pose_split.pt

For evaluation:

test_set.zip (unzip before use)

Training

1. Rigid Pre-training

python train_rigid_pt.py

2. Rigid Fine-tuning

python train_rigid_ft.py

3. Semi-flexible Training

python train_sfrl.py

Sampling

python sample.py \
  --config configs/rl.yml \
  --start_index 0 \
  --end_index 99 \
  --timesteps 50

`--timesteps` Argument

Property	Value
Range	`10` to `1000` (controls diffusion steps)
Recommendation	`50` (optimal speed/quality balance)
Performance	⚡ 20x faster than default (1000 steps) ✅ No detectable quality loss

Evaluation

Evaluate generated molecules:

python eval_split_diff.py

Coming Soon

Complete visualization platform
Pre-trained model weights
Tutorial notebooks
Docker image for easy deployment

Citation

Our paper is under review, if you find our code helpful, please cite

@misc{SeFMol2025,
title = {Steering Semi-flexible Molecular Diffusion Model for Structure-Based Drug Design with Reinforcement Learning},
author = {Zhang, Xudong and Qu, Sanqing and Lu, Fan and Wang, Jianmin and Tian, Zhixin and Gu, Shangding and Zhang, Yanping and Knoll, Alois and Gao, Shaorong and Chen, Guang and Jiang, Changjun},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ispc-lab/SeFMol}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
configs		configs
datasets		datasets
figs		figs
models		models
utils		utils
README.md		README.md
cal_results_from_pt.py		cal_results_from_pt.py
environment.yaml		environment.yaml
eval_split_diff.py		eval_split_diff.py
reward_model.py		reward_model.py
sample.py		sample.py
train_rigid.py		train_rigid.py
train_sfrl.py		train_sfrl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SeFMol: Steering Semi-flexible Molecular Diffusion Model for Structure-Based Drug Design with Reinforcement Learning

Key Features

Performance Comparison

Platform Preview

Installation

Prerequisites

Create Environment

Install Dependencies

Data Preparation

Training

1. Rigid Pre-training

2. Rigid Fine-tuning

3. Semi-flexible Training

Sampling

`--timesteps` Argument

Evaluation

Coming Soon

Citation

About

Uh oh!

Releases

Packages

Languages

ispc-lab/SeFMol

Folders and files

Latest commit

History

Repository files navigation

SeFMol: Steering Semi-flexible Molecular Diffusion Model for Structure-Based Drug Design with Reinforcement Learning

Key Features

Performance Comparison

Platform Preview

Installation

Prerequisites

Create Environment

Install Dependencies

Data Preparation

Training

1. Rigid Pre-training

2. Rigid Fine-tuning

3. Semi-flexible Training

Sampling

--timesteps Argument

Evaluation

Coming Soon

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`--timesteps` Argument

Packages