This repository provides the official implementation of RecSal-Net, introduced in our paper:
ChaeEun Woo, SuMin Lee, Soo Min Park, and Byung Hyung Kim, “RecSal-Net: Recursive Saliency Network for Video Saliency Prediction,” Neurocomputing, 2025. [pdf] [link]
RecSal-Net is a recursive transformer architecture that integrates a transformer-based encoder with a recursive feature integration mechanism, specifically designed for the task of Video Saliency Prediction (VSP).
The overall architecture of RecSal-Net. (a) The RecSal-Net model, including a transformer-based encoder, recursive blocks, and a decoder. (b) The recursive block, which iteratively refines multi-scale spatiotemporal features.
Please create an Anaconda virtual environment by:
$ conda create -n RS python=3.8 -y
Activate the virtual environment by:
$ conda activate RS
Install the requirements by:
$ pip3 install -r requirements.txt
Please download the pre-trained VST here and the DHF1K dataset here.
Project/ │ ├── saved_models/ │ └── RecSalNet.pth │ ├── data/ │ └── DHF1K/ │ ├── train/ │ └── val/ │ ├── dataloader.py ├── loss.py ├── model.py ├── swin_transformer.py ├── test.py ├── train.py ├── utils.py ├── requirements.txt └── swin_small_patch244_window877_kinetics400_1k.pth
You can run the code by:
$ python3 train.py
The results will be saved in a folder named saved_models.
After you finish all the training processes, you can use test.py to generate the predicted saliency maps and compute all evaluation metrics by:
$ python3 test.py
Please cite our paper if you use our code in your own work:
@article{woo2025recsal,
title={RecSal-Net: Recursive Saliency Network for video saliency prediction},
author={Woo, ChaeEun and Lee, SuMin and Park, Soo Min and Kim, Byung Hyung},
journal={Neurocomputing},
pages = {130822},
year={2025},
volume={650}
}