Welcome to the official repository of our paper "Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation" accepted at the CVPR 2025 PixFoundation Workshop.
Our benchmark evaluates visual and textual prompts in semantic segmentation across 7 diverse domains and 14 datasets:
Domain | Datasets |
---|---|
🏙️ Common | ADE20K, PASCAL VOC 2012 |
🚗 Urban | Cityscapes, UAVid |
♻️ Waste | Trash, ZeroWaste |
🍕 Food | Pizza, UECFood |
🔧 Tools | Toolkits, PIDray |
🏠 Parts | House-Parts, MHPv1 |
🌳 Land-Cover | LoveDA-Rural, LoveDA-Urban |
We provide Docker containers for both PyTorch and MMSegmentation models.
📦 PyTorch Environment
Our container is based on PyTorch 2.5.1 with CUDA 11.8 and Python 3.11.
Option 1: Pull from DockerHub
docker pull gabrysse/showortell:torch
Option 2: Build locally
cd docker/pytorch && docker build -t gabrysse/showortell:torch .
Running the container
Via command line:
docker run --name=showortell-torch --gpus all -it \
-v ./ShowOrTell:/workspace/ShowOrTell \
--shm-size=8G --ulimit memlock=-1 \
gabrysse/showortell:torch
Or using docker compose:
cd docker/pytorch
docker compose up -d
docker attach showortell-torch
📦 MMSegmentation Environment
For MMSegmentation-based models, you'll need to set up the appropriate environment according to the model's requirements. Please refer to the installation instructions in each model's documentation.
Important
UAVid dataset requires manual download. Follow the instructions provided by the downloader script when prompted.
Our convenient downloader script will fetch all benchmark datasets and apply necessary preprocessing:
cd datasets && bash downloader.sh
To download only specific datasets:
cd datasets && bash downloader.sh --<DATASET_NAME>
Available datasets: pascalvoc
, ade20k
, cityscapes
, houseparts
, pizza
, toolkits
, trash
, loveda
, zerowaste
, mhpv1
, pidray
, uecfood
, uavid
.
For more options, run:
bash downloader.sh --help
See our Getting Started Guide for detailed instructions on implementing your model.
After implementing your model, run the evaluation with:
python3 benchmark.py \
--model-name GFSAM --nprompts 5 \
--benchmark pizza
- Available models:
GFSAM
,Matcher
,PersonalizeSAM
,SINE
. - Available datasets:
pascal
,cityscapes
,ade20k
,lovedarural
,lovedaurban
,mhpv1
,pidray
,houseparts
,pizza
,toolkits
,trash
,uecfood
,zerowaste
,uavid
.
torchrun --nproc_per_node=2 benchmark.py \
--model-name GFSAM --nprompts 5 \
--benchmark pizza
Change --nproc_per_node
with the desired GPU number.
--datapath <DATASETS_PATH>
: Specify custom datasets folder (default:./datasets
).--checkpointspath <CHECKPOINTS_PATH>
: Custom folder for model checkpoints (default:./models/checkpoints
).--seed <SEED>
: Set a specific random seed.--save-visualization
: Save visualization of predictions for the first 50 images. Visualization will be available in thepredictions
folder.
If you find this project helpful for your research, please consider citing the following BibTeX entry.
@article{rosi2025show,
title={Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation},
author={Rosi, Gabriele and Cermelli, Fabio},
journal={arXiv preprint arXiv:2505.06280},
year={2025}
}