Skip to content
/ FEAT Public

[MICCAI 2025] FEAT:Full-Dimensional Efficient Attention Transformer for Medical Video Generation.

License

Notifications You must be signed in to change notification settings

Yaziwel/FEAT

Repository files navigation

FEAT:Full-Dimensional Efficient Attention Transformer for Medical Video Generation (MICCAI 2025)

This paper has been early accepted by MICCAI 2025 (top 9%)arXiv

Huihan Wang1* Zhiwen Yang1* Hui Zhang2 Dan Zhao3 Bingzheng Wei4 Yan Xu1

1BUAA   2THU   3PUMC   4ByteDance  

* Equal Contributions. Corresponding Author.

example.mp4

introduction

🛠Setup

git clone https://github.com/Yaziwel/FEAT.git
cd FEAT
conda create -n FEAT python=3.10
conda activate FEAT

pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118

pip install -r requirements.txt

📚Data Preparation

Colonoscopic: The dataset provided by paper can be found here. You can directly use the processed video data by Endo-FM without further data processing.

Kvasir-Capsule: The dataset provided by paper can be found here. You can directly use the processed video data by Endo-FM without further data processing.

Please run process_data.py and process_list.py to get the split frames and the corresponding list at first.

CUDA_VISIBLE_DEVICES=gpu_id python process_data.py -s ./data/Colonoscopic -t ./data/Colonoscopic_frames

CUDA_VISIBLE_DEVICES=gpu_id python process_list.py -f ./data/Colonoscopic_frames -t ./data/Colonoscopic_frames/train_128_list.txt

The resulted file structure is as follows.

├── data
│   ├── Colonoscopic
│     ├── 00001.mp4
|     ├──  ...
│   ├── Kvasir-Capsule
│     ├── 00001.mp4
|     ├──  ...
│   ├── Colonoscopic_frames
│     ├── train_128_list.txt
│     ├── 00001
│           ├── 00000.jpg
|           ├── ...
|     ├──  ...
│   ├── Kvasir-Capsule_frames
│     ├── train_128_list.txt
│     ├── 00001
│           ├── 00000.jpg
|           ├── ...
|     ├──  ...

⏳Training

You can follow the steps below to train FEAT:

bash train_scripts/col/train_col.sh
bash train_scripts/kva/train_kva.sh

🎇Sampling

You can directly sample the medical videos from the checkpoint model. Here is an example for quick usage for using our pre-trained models:

  1. Download the pre-trained weights from here and put them to specific path defined in the configs. You can also use huggingface_hub to download the weights. For example, a checkpoint can be download like so:
from huggingface_hub import hf_hub_download

# 4 models supported: FEAT_L_col.pt, FEAT_L_kva.pt, FEAT_S_col.pt and FEAT_S_kva.pt
filepath = hf_hub_download(repo_id="WTHH031230/FEAT", filename="FEAT_L_col.pt")
  1. Run sample.py by the following scripts to customize the various arguments like adjusting sampling steps.

You can follow the steps below to sample a video by using FEAT:

bash sample/col.sh
bash sample/kva.sh

DDP sample:

bash sample/col_ddp.sh
bash sample/kva_ddp.sh

After the DDP sample, there will be more than 3125 videos generated to calculate the metrics.

📏Evaluation

The metrics we calculated in Colonoscopic dataset are below:

Method FVD↓ CD-FVD↓ FID↓ IS↑
StyleGAN-V 2110.7 1032.8 226.14 2.12
LVDM 1036.7 792.9 96.85 1.93
MoStGAN-V 468.5 592.0 53.17 3.37
Endora 460.7 545.3 13.41 3.90
FEAT-S (Ours) 415.4 444.0 13.34 3.96
FEAT-L (Ours) 351.1 397.0 12.31 4.01

Before calculating the metrics in our code, you may need the weights for several models, which can be downloaded from the following links:

You can also simply follow this part of the code in Endora to automatically download models from the internet for metric calculation.

To calculate the metrics, you can follow the steps below to evaluate the model.

## FVD, FID and IS
CUDA_VISIBLE_DEVICES=gpu_id python process_data.py -s /path/to/generated/video -t /path/to/video/frames
cd /path/to/stylegan-v
CUDA_VISIBLE_DEVICES=gpu_id python ./src/scripts/calc_metrics_for_dataset.py \
  --fake_data_path /path/to/video/frames \
  --real_data_path /path/to/dataset/frames 
  
## CD-FVD
CUDA_VISIBLE_DEVICES=gpu_id python calculate_cdfvd.py

🧰Running Other Methods

As we follow the work Endora, you can run other methods the same way as how Endora described.

🎪Downstream Application

As we follow the work Endora, you can run the downstream task the same way as how Endora described.

Method Colonoscopic
Supervised-only 74.5
LVDM 76.2
Endora 87.0
FEAT-S (ours) 89.9
FEAT-L (ours) 91.3

🎈Acknowledgements

Greatly appreciate the tremendous effort for the following projects!

📜Citation

If you find FEAT useful in your research, please consider citing:

@article{wang2025feat,
  author    = {Huihan Wang and Zhiwen Yang and Hui Zhang and Dan Zhao and Bingzheng Wei and Yan Xu},
  title     = {FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation},
  journal   = {arXiv preprint arXiv:2506.04956},
  year      = {2025}
}

About

[MICCAI 2025] FEAT:Full-Dimensional Efficient Attention Transformer for Medical Video Generation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •