Skip to content

[ACMMM UAVM 2025] πŸŒπŸš— VICI: VLM-Instructed Cross-view Image-localisation πŸ“‘πŸ—ΊοΈ

Notifications You must be signed in to change notification settings

tavisshore/VICI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ““ Description

🧬 Feature Extractors

Backbone Params (M) FLOPs (G) Dims R@1 R@5 R@10
ConvNeXt-T 28 4.5 768 1.36 4.34 7.95
ConvNeXt-B 89 15.4 1024 3.14 8.14 13.22
ViT-B 86 17.6 768 3.30 8.92 13.96
ViT-L 307 60.6 1024 9.62 23.42 32.73
DINOv2-B 86 152 768 17.37 36.14 46.96
DINOv2-L 304 507 1024 27.49 51.96 63.13

🧰 Vision-Language Models

VLM R@1 R@5 R@10
Without Re-ranking 27.49 51.96 63.13
Gemini 2.5 Flash Lite 23.54 48.39 63.13
Gemini 2.5 Flash 30.21 53.04 63.13

πŸ›Έ Drone Augmentation

$P$ R@1 R@5 R@10
0 24.47 48.16 60.99
0.1 26.98 51.34 61.92
0.3 27.49 51.96 63.13
0.5 24.89 52.03 62.66

🎯 Ablation study and baseline comparison.

Model R@1 R@5 R@10
U1652~\cite{zheng2020university} 1.20 - -
LPN w/o drone~\cite{wang2021each} 0.74 - -
LPN w/ drone~\cite{wang2021each} 0.81 - -
DINOv2-L 24.66 48.00 59.02
+ Drone Data 27.49 51.96 63.13
+ VLM Re-rank (Ours) 30.21 53.04 63.13

πŸ“Š Evaluation

🐍 Environment Setup

conda env create -n ENV -f requirements.yaml && conda activate ENV

🐍 Stage 1 - Image Retrieval

Before running Stage 1, configure your dataset paths:

  1. Navigate to the /config/ directory.
  2. Open the default.yaml file (or copy it to a new file).
  3. Replace the placeholder values (e.g., DATA_ROOT) with the actual paths to your dataset and related files.

Once your configuration file is ready, you can train Stage 1 using:

python stage_1.py --config YOUR_CONFIG_FILE_NAME

You can also download our pre-trained weights here.

🐍 Stage 2 - VLM Re-ranking

To run Stage 2, you need to:

  1. Open the stage_2.py file.
  2. Replace the relevant placeholders (e.g., the path to the answer file from Stage 1 and your Gemini API key).
  3. Ensure any other required directories or options are correctly set.

Then, simply run:

python stage_2.py

This will perform re-ranking using a Vision-Language Model (VLM) on top of the initial retrieval results. There will be a LLM_re_ranked_answer.txt in the answer directory and a reasons.json containing all the reasons for re-ranking.

πŸ“— Related Works

Β Β Β Β Β  arxiv Conference Project Page GitHub

Β Β Β Β Β  arxiv Conference Project Page GitHub

⭐ Star History

Star History Chart

About

[ACMMM UAVM 2025] πŸŒπŸš— VICI: VLM-Instructed Cross-view Image-localisation πŸ“‘πŸ—ΊοΈ

Topics

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages