The pytorch implementation of "Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model"
cd SamGOP
conda create --name samgop python=3.8 -y
conda activate samgop
conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c nvidia
pip install -U opencv-python
# under your working directory
cd detectron2
pip install -e .
cd ..
pip install -r requirements.txt
cd maskGOP/modeling/pixel_decoder/ops
sh make.sh
We train our model on GOO-Real and GOO-Synth datasets respectively
You can download GOO-synth from OneDrive:
Train: part1, part2, part3, part4, part5, part6, part7, part8, part9, part10, part11
Test: GOOsynth-test_data
Annotation file:
GOOsynth-train_data_Annotation (Code:v4nx)
GOOsynth-test_data_Annotation (Code:ayqm)
You can download GOO-Real from OneDrive:
Train: GOOreal-train_data
Test: GOOreal-test_data
You can download GOO-Real annotations file from Baidu disk::
GOOreal-train_data_Annotation (code:2p89)
GOOreal-val_data_Annotation (code:p9f9)
If you want to train on GOO-Real or GOO-Synth dataset, please keep the data structure as follows:
├── datasets
└── coco
└── annotations
└── cate.txt
└── train2017.json
└── val2017.json
└── train2017
├── 0.png
├── 1.png
├── ...
└── val2017
├── 3609.png
├── 3610.png
├── ...
To carry out experiments, please follow these commands:
python train_net.py --num-gpus 1 --config-file ./configs/coco/instance-segmentation/maskGOP_R50_bs2_75ep_3s.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0001
To eval the model, please follow these commands:
python eavl_train_net.py --eval-only --num-gpus 1 --config-file ./configs/coco/instance-segmentation/maskGOP_R50_bs2_75ep_3s.yaml MODEL.WEIGHTS weights_path
Download model weights from Baidu disk:
GOO-Synth_re-trained_model (code:2ma2)
GOO-Real_re-trained_model (code:24zt)
Our implamentation is based on detectron2 and maskdino
@inproceedings{jin2024boosting,
title={Boosting Gaze Object Prediction via Pixel-Level Supervision from Vision Foundation Model},
author={Jin, Yang and Zhang, Lei and Yan, Shi and Fan, Bin and Wang, Binglu},
booktitle={European Conference on Computer Vision},
pages={369--386},
year={2024},
organization={Springer}
}