UAM-FV-VS: Unified Attention Modeling for Efficient Free-Viewing and Visual Search via Shared Representations

This repository provides the official implementation for Unified Attention Modeling for Efficient Free-Viewing and Visual Search via Shared Representations, ICDL 2025.

This work extends the HAT model by introducing a unified attention modeling framework with shared representation for free-viewing and target-present visual search tasks.

Installation

Follow HAT installation guide:

Create a Conda Environment

conda create -n uam python=3.10 -y
conda activate uam

Install PyTorch with CUDA 11.8:

python -m pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

Install Additional Dependencies:

python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
python -m pip install wget timm pytz

Build MSDeformableAttention:
```
cd ./hat/pixel_decoder/ops
sh make.sh
```
Download Pretrained Weights & HAT Checkpoints (Used to train TP branch utilizing some layers pretrained on FV):
```
cd -
python download.py
```

Dataset

For COCO-Search18 and COCO-FreeView:
- Download the datasets and place it in /Datasets/ directory.
- Download the Dataloaders folder containing .pkl files for training target present (TP) branch or free-viewing (FV) branch, and place it in the main project directory.
For our additional collected data to test the model:
- Download the dataset, and place it in /Datasets/ directory.

The final folder structure should look like this:

/YourProject/
├── Dataloaders/
│   ├── data_FV_loader/
│   └── data_TP_loader/
└── Datasets/
    ├── COCO-Search18 and COCO-Freeview/
    │   ├── images/
    │   ├── images_with_fixs/
    │   ├── semantic_seq_full/
    │   ├── bbox_annos.npy
    │   ├── clusters.npy
    │   ├── coco_freeview_fixations_512x320.json
    │   ├── coco_search_fixations_512x320_on_target_allvalid.json
    │   ├── M2F_R50_MSDeformAttnPixelDecoder.pkl
    │   ├── M2F_R50.pkl
    │   ├── resnet50.yaml
    │   └── scene_label_dict.npy
    └── extra_dataset/
        ├── annotations/
        ├── images/
        └── README.md

Checkpoints

Download the pretrained weights for different shared_configurations of the unified model,and place it in the /checkpoints/ directory.

The final folder structure should look like this:

/YourProject/
└── checkpoints/
    ├── Final_checkpoints/
    │   ├── ES_1_5.pt
    │   ├── ES_2_4.pt
    │   ├── ES_3_3.pt
    │   ├── ES_4_2.pt
    │   ├── ES_5_1.pt
    │   └── LS.pt
    └── HAT_checkpoints/
        ├── HAT_FV.pt
        └── HAT_TP.pt

New Configuration Variables

The following configuration variables are added to config files in ./configs/ to support different shared represenation of training:

Variable Name	Type	Description
`branch`	String	Determines which branch to train. Options: - `TP`: target present - `FV`: free-viewing
`use_HAT_FV_weights`	Boolean	Used to train TP branch utilizing some layer trained on FV. - Set to `true` to initialize shared layers from HAT FV pretrained weights (set `checkpoint` to `./checkpoints/HAT_checkpoints/HAT_FV.pt`). - Set to `false` for resuming training from a saved checkpoint, and set `checkpoint` to the saved checkpoint path.
`shared_config`	String	Controls shared layer configuration, Options: - `None`: no shared represenation, train the whole pixel decoder. - `LS`: all pixel decoder fixed. - `ES_5_1`: only last layer task-specific. - `ES_4_2`: last two layers task-specific. - `ES_3_3`: last three layers task-specific. - `ES_2_4`: last four layers task-specific. - `ES_1_5`: last five layers task-specific.

Training and testing the unified attention model

Run the demo code on your test image.

Training the unified attention Model on COCO-Search18

Run this command:

python train.py --hparams ./configs/coco_freeview_dense_SSL_train.json --dataset-root <COCO_dataset_root>

Evaluating the unified attention Model on COCO-Search18

Run this command:

python train.py --hparams ./configs/coco_freeview_dense_SSL_eval.json  --dataset-root <COCO_dataset_root> --eval-only

Evaluating the unified attention Model on the additional collected dataset

Run this command:

python train.py --hparams ./configs/extradata_config_eval.json --dataset-root <extra_dataset_root> --eval-only

Training on Custom Data

For training on your own dataset, please follow the detailed instructions provided in the HAT repo.

Citation

If you use this repository in your work, please cite the following paper:

@article{mohammed2025unified,
  title={Unified Attention Modeling for Efficient Free-Viewing and Visual Search via Shared Representations},
  author={Mohammed, Fatma Youssef and Alexis, Kostas},
  journal={arXiv preprint arXiv:2506.02764},
  year={2025}
}

Contact

For questions or support, please open an issue on GitHub or contact the authors directly:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UAM-FV-VS: Unified Attention Modeling for Efficient Free-Viewing and Visual Search via Shared Representations

Installation

Dataset

Checkpoints

New Configuration Variables

Training and testing the unified attention model

Run the demo code on your test image.

Training the unified attention Model on COCO-Search18

Evaluating the unified attention Model on COCO-Search18

Evaluating the unified attention Model on the additional collected dataset

Training on Custom Data

Citation

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
common		common
configs		configs
demo		demo
hat		hat
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
download.py		download.py
environment.yaml		environment.yaml
train.py		train.py

License

ntnu-arl/uam-fv-vs

Folders and files

Latest commit

History

Repository files navigation

UAM-FV-VS: Unified Attention Modeling for Efficient Free-Viewing and Visual Search via Shared Representations

Installation

Dataset

Checkpoints

New Configuration Variables

Training and testing the unified attention model

Run the demo code on your test image.

Training the unified attention Model on COCO-Search18

Evaluating the unified attention Model on COCO-Search18

Evaluating the unified attention Model on the additional collected dataset

Training on Custom Data

Citation

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages