This repository provides the official implementation of RecSal-Net, introduced in our paper:
ChaeEun Woo, SuMin Lee, Soo Min Park, and Byung Hyung Kim, “RecSal-Net: Recursive Saliency Network for Video Saliency Prediction,” Neurocomputing, 2025. [pdf] [link]
RecSal-Net is a recursive transformer architecture that integrates a transformer-based encoder with a recursive feature integration mechanism, specifically designed for the task of Video Saliency Prediction (VSP).
The overall architecture of RecSal-Net. (a) The RecSal-Net model, including a transformer-based encoder, recursive blocks, and a decoder. (b) The recursive block, which iteratively refines multi-scale spatiotemporal features.
Please create an Anaconda virtual environment by:
$ conda create -n RS python=3.8 -y
Activate the virtual environment by:
$ conda activate RS
Install the requirements by:
$ pip3 install -r requirements.txt
Please download the pre-trained VST here and the DHF1K dataset here.
Project/ │ ├── saved_models/ │ └── RecSalNet.pth │ ├── data/ │ └── DHF1K/ │ ├── train/ │ └── val/ │ ├── dataloader.py ├── loss.py ├── model.py ├── swin_transformer.py ├── test.py ├── train.py ├── utils.py ├── requirements.txt └── swin_small_patch244_window877_kinetics400_1k.pth
You can run the code by:
$ python3 train.py
The results will be saved in a folder named saved_models.
After you finish all the training processes, you can use test.py to generate the predicted saliency maps and compute all evaluation metrics by:
$ python3 test.py
Table 1. Quantitative comparison on DHF1K dataset. The best result is marked in bold.
AUC_J↑ | SIM↑ | s-AUC↑ | CC↑ | NSS↑ | |
---|---|---|---|---|---|
DeepVS | 0.856 | 0.256 | 0.583 | 0.344 | 1.911 |
ACLNet | 0.890 | 0.315 | 0.601 | 0.434 | 2.354 |
SalEMA | 0.890 | 0.466 | 0.667 | 0.449 | 2.574 |
STRA-Net | 0.895 | 0.355 | 0.663 | 0.458 | 2.558 |
TASED-Net | 0.895 | 0.361 | 0.712 | 0.470 | 2.667 |
Chen et al. | 0.900 | 0.353 | 0.680 | 0.476 | 2.685 |
SalSAC | 0.896 | 0.357 | 0.697 | 0.479 | 2.673 |
UNISAL | 0.901 | 0.390 | 0.691 | 0.490 | 2.776 |
HD2S | 0.908 | 0.406 | 0.700 | 0.503 | 2.812 |
ViNet | 0.908 | 0.381 | 0.729 | 0.511 | 2.872 |
ECANet | 0.903 | 0.385 | 0.717 | 0.500 | 2.814 |
TSFP-Net | 0.912 | 0.392 | 0.723 | 0.517 | 2.967 |
STSANet | 0.913 | 0.383 | 0.723 | 0.529 | 3.010 |
GFNet | 0.913 | 0.379 | 0.723 | 0.529 | 2.995 |
Ours | 0.913 | 0.414 | 0.728 | 0.547 | 3.135 |
Table 2. Quantitative comparison on Hollywood-2 dataset. The best result is marked in bold.
AUC_J↑ | SIM↑ | CC↑ | NSS↑ | |
---|---|---|---|---|
DeepVS | 0.887 | 0.356 | 0.446 | 2.313 |
ACLNet | 0.890 | 0.542 | 0.623 | 3.086 |
SalEMA | 0.919 | 0.487 | 0.613 | 3.186 |
STRA-Net | 0.923 | 0.487 | 0.662 | 3.478 |
TASED-Net | 0.918 | 0.507 | 0.646 | 3.302 |
Chen et al. | 0.928 | 0.537 | 0.661 | 3.804 |
SalSAC | 0.931 | 0.529 | 0.670 | 3.356 |
UNISAL | 0.934 | 0.543 | 0.673 | 3.901 |
HD2S | 0.936 | 0.551 | 0.670 | 3.352 |
ViNet | 0.930 | 0.550 | 0.693 | 3.730 |
ECANet | 0.929 | 0.526 | 0.673 | 3.380 |
TSFP-Net | 0.936 | 0.571 | 0.711 | 3.910 |
STSANet | 0.938 | 0.579 | 0.721 | 3.927 |
GFNet | 0.938 | 0.585 | 0.719 | 3.952 |
Ours | 0.938 | 0.606 | 0.737 | 4.061 |
Table 3. Quantitative comparison on UCF sports dataset. The best result is marked in bold.
AUC_J↑ | SIM↑ | CC↑ | NSS↑ | |
---|---|---|---|---|
DeepVS | 0.870 | 0.321 | 0.405 | 2.089 |
ACLNet | 0.897 | 0.406 | 0.510 | 2.567 |
SalEMA | 0.906 | 0.431 | 0.544 | 2.638 |
STRA-Net | 0.910 | 0.479 | 0.593 | 3.018 |
TASED-Net | 0.899 | 0.469 | 0.582 | 2.920 |
Chen et al. | 0.917 | 0.494 | 0.599 | 3.406 |
SalSAC | 0.926 | 0.534 | 0.671 | 3.523 |
UNISAL | 0.918 | 0.523 | 0.644 | 3.381 |
HD2S | 0.904 | 0.507 | 0.604 | 3.114 |
ViNet | 0.924 | 0.522 | 0.673 | 3.620 |
ECANet | 0.917 | 0.498 | 0.636 | 3.189 |
TSFP-Net | 0.923 | 0.561 | 0.685 | 3.698 |
STSANet | 0.936 | 0.560 | 0.705 | 3.908 |
GFNet | 0.933 | 0.544 | 0.694 | 3.723 |
Ours | 0.933 | 0.557 | 0.698 | 3.769 |
Please cite our paper if you use our code in your own work:
@article{woo2025recsal,
title={RecSal-Net: Recursive Saliency Network for video saliency prediction},
author={Woo, ChaeEun and Lee, SuMin and Park, Soo Min and Kim, Byung Hyung},
journal={Neurocomputing},
pages = {130822},
year={2025},
volume={650}
}