Skip to content

ntnu-arl/uam-fv-vs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UAM-FV-VS: Unified Attention Modeling for Efficient Free-Viewing and Visual Search via Shared Representations

This repository provides the official implementation for Unified Attention Modeling for Efficient Free-Viewing and Visual Search via Shared Representations, ICDL 2025.

This work extends the HAT model by introducing a unified attention modeling framework with shared representation for free-viewing and target-present visual search tasks.

Installation

Follow HAT installation guide:

  1. Create a Conda Environment

    conda create -n uam python=3.10 -y
    conda activate uam
    
  2. Install PyTorch with CUDA 11.8:

    python -m pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
  3. Install Additional Dependencies:

    python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
    python -m pip install wget timm pytz
  4. Build MSDeformableAttention:

    cd ./hat/pixel_decoder/ops
    sh make.sh
  5. Download Pretrained Weights & HAT Checkpoints (Used to train TP branch utilizing some layers pretrained on FV):

    cd -
    python download.py

Dataset

  • For COCO-Search18 and COCO-FreeView:

    • Download the datasets and place it in /Datasets/ directory.

    • Download the Dataloaders folder containing .pkl files for training target present (TP) branch or free-viewing (FV) branch, and place it in the main project directory.

  • For our additional collected data to test the model:

    • Download the dataset, and place it in /Datasets/ directory.
  • The final folder structure should look like this:

    /YourProject/
    ├── Dataloaders/
    │   ├── data_FV_loader/
    │   └── data_TP_loader/
    └── Datasets/
        ├── COCO-Search18 and COCO-Freeview/
        │   ├── images/
        │   ├── images_with_fixs/
        │   ├── semantic_seq_full/
        │   ├── bbox_annos.npy
        │   ├── clusters.npy
        │   ├── coco_freeview_fixations_512x320.json
        │   ├── coco_search_fixations_512x320_on_target_allvalid.json
        │   ├── M2F_R50_MSDeformAttnPixelDecoder.pkl
        │   ├── M2F_R50.pkl
        │   ├── resnet50.yaml
        │   └── scene_label_dict.npy
        └── extra_dataset/
            ├── annotations/
            ├── images/
            └── README.md
    

Checkpoints

  • Download the pretrained weights for different shared_configurations of the unified model,and place it in the /checkpoints/ directory.
  • The final folder structure should look like this:
    /YourProject/
    └── checkpoints/
        ├── Final_checkpoints/
        │   ├── ES_1_5.pt
        │   ├── ES_2_4.pt
        │   ├── ES_3_3.pt
        │   ├── ES_4_2.pt
        │   ├── ES_5_1.pt
        │   └── LS.pt
        └── HAT_checkpoints/
            ├── HAT_FV.pt
            └── HAT_TP.pt
    

New Configuration Variables

The following configuration variables are added to config files in ./configs/ to support different shared represenation of training:

Variable Name Type Description
branch String Determines which branch to train. Options:
- TP: target present
- FV: free-viewing
use_HAT_FV_weights Boolean Used to train TP branch utilizing some layer trained on FV.
- Set to true to initialize shared layers from HAT FV pretrained weights (set checkpoint to ./checkpoints/HAT_checkpoints/HAT_FV.pt).
- Set to false for resuming training from a saved checkpoint, and set checkpoint to the saved checkpoint path.
shared_config String Controls shared layer configuration, Options:
- None: no shared represenation, train the whole pixel decoder.
- LS: all pixel decoder fixed.
- ES_5_1: only last layer task-specific.
- ES_4_2: last two layers task-specific.
- ES_3_3: last three layers task-specific.
- ES_2_4: last four layers task-specific.
- ES_1_5: last five layers task-specific.

Training and testing the unified attention model

Run the demo code on your test image.

Training the unified attention Model on COCO-Search18

  • Run this command:
    python train.py --hparams ./configs/coco_freeview_dense_SSL_train.json --dataset-root <COCO_dataset_root>

Evaluating the unified attention Model on COCO-Search18

  • Run this command:
    python train.py --hparams ./configs/coco_freeview_dense_SSL_eval.json  --dataset-root <COCO_dataset_root> --eval-only

Evaluating the unified attention Model on the additional collected dataset

  • Run this command:
    python train.py --hparams ./configs/extradata_config_eval.json --dataset-root <extra_dataset_root> --eval-only
    

Training on Custom Data

  • For training on your own dataset, please follow the detailed instructions provided in the HAT repo.

Citation

If you use this repository in your work, please cite the following paper:

@article{mohammed2025unified,
  title={Unified Attention Modeling for Efficient Free-Viewing and Visual Search via Shared Representations},
  author={Mohammed, Fatma Youssef and Alexis, Kostas},
  journal={arXiv preprint arXiv:2506.02764},
  year={2025}
}

Contact

For questions or support, please open an issue on GitHub or contact the authors directly:

About

Unified Attention Modeling for Efficient Free-Viewing and Visual Search via Shared Representations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published