Skip to content

affctivai/RecSal-Net

Repository files navigation

RecSal-Net

This repository provides the official implementation of RecSal-Net, introduced in our paper:

ChaeEun Woo, SuMin Lee, Soo Min Park, and Byung Hyung Kim, “RecSal-Net: Recursive Saliency Network for Video Saliency Prediction,” Neurocomputing, 2025. [pdf] [link]

RecSal-Net is a recursive transformer architecture that integrates a transformer-based encoder with a recursive feature integration mechanism, specifically designed for the task of Video Saliency Prediction (VSP).

Network structure of RecSal-Net

RecSal-Net structure

Fig.1 RecSal-Net structure

The overall architecture of RecSal-Net. (a) The RecSal-Net model, including a transformer-based encoder, recursive blocks, and a decoder. (b) The recursive block, which iteratively refines multi-scale spatiotemporal features.

Prepare the Python virtual environment

Please create an Anaconda virtual environment by:

$ conda create -n RS python=3.8 -y

Activate the virtual environment by:

$ conda activate RS

Install the requirements by:

$ pip3 install -r requirements.txt

Run the code

Please download the pre-trained VST here and the DHF1K dataset here.

Project/
│
├── saved_models/
│   └── RecSalNet.pth
│
├── data/
│   └── DHF1K/
│       ├── train/
│       └── val/
│
├── dataloader.py
├── loss.py
├── model.py
├── swin_transformer.py
├── test.py
├── train.py
├── utils.py
├── requirements.txt
└── swin_small_patch244_window877_kinetics400_1k.pth

You can run the code by:

$ python3 train.py

The results will be saved in a folder named saved_models.

After you finish all the training processes, you can use test.py to generate the predicted saliency maps and compute all evaluation metrics by:

$ python3 test.py

Cite

Please cite our paper if you use our code in your own work:

@article{woo2025recsal,
  title={RecSal-Net: Recursive Saliency Network for video saliency prediction},
  author={Woo, ChaeEun and Lee, SuMin and Park, Soo Min and Kim, Byung Hyung},
  journal={Neurocomputing},
  pages = {130822},
  year={2025},
  volume={650}
}

Contributors 4

  •  
  •  
  •  
  •  

Languages