News | Usage | Citation | Acknowledgement
This is the official repository for the paper:
Abstract
Visual Place Recognition (VPR) is a crucial capability for long-term autonomous robots, enabling them to identify previously visited locations using visual information. However, existing methods remain limited in indoor settings due to the highly repetitive structures inherent in such environments. We observe that scene text typically appears in indoor spaces, serving to distinguish visually similar but different places. This inspires us to propose TextInPlace, a simple yet effective VPR framework that integrates Scene Text Spotting (STS) to mitigate visual perceptual ambiguity in repetitive indoor environments. Specifically, TextInPlace adopts a dual-branch architecture within a local parameter sharing network. The VPR branch employs attention-based aggregation to extract global descriptors for coarse-grained retrieval, while the STS branch utilizes a bridging text spotter to detect and recognize scene text. Finally, the discriminative text is filtered to compute text similarity and re-rank the top-K retrieved images. To bridge the gap between current text-based repetitive indoor scene datasets and the typical scenarios encountered in robot navigation, we establish an indoor VPR benchmark dataset, called Maze-with-Text. Extensive experiments on both custom and public datasets demonstrate that TextInPlace achieves superior performance over existing methods that rely solely on appearance information.-
2025-06-23: The code for TextInPlace is publicly available in this repository📦!
-
2025-06-16: TextInPlace is accepted by IROS 2025. 🎉🎉🎉
You can create your own conda environment for TextInPlace based on the following command⚙️:
conda create -n stloc python=3.10 -y
conda activate stloc
pip install torch==2.2.0+cu121 torchvision==0.17.0+cu121 --index-url https://download.pytorch.org/whl/cu121
cd detectron2
pip install -e . && cd ..
pip install -r requirements.txt
python setup.py build developTo bridge the gap between current text-based repetitive indoor scene datasets and the typical scenarios encountered in robot navigation, we establish an indoor VPR benchmark dataset, called Maze-with-Text. The number of images in the Maze-with-Text dataset is as follows:
| Floor | 1 | 2 | 3 | 4 | 5 | All |
|---|---|---|---|---|---|---|
| Queries | 280 | 253 | 258 | 245 | 269 | 1305 |
| Database | 1368 | 2268 | 1588 | 1720 | 1596 | 8540 |
You can download Maze-with-Text dataset from Google Drive. After downloading, please unzip the archive and organize the dataset into the following directory structure:
|-- Maze-with-Text
|-- images
|-- test
|-- database
| |-- @-00.0027@038.6324@5@339@0@.jpg
......
|-- queries
| |-- @-00.0012@032.7272@5@65@3@.jpg
......
Evaluation script
python -W ignore eval.py --backbone ResNet50 --aggregation boq \
--features_dim 16384 \
--infer_batch_size 64 \
--config-file configs/Bridge/TotalText/R_50_poly.yaml \
--dataset_name Maze-with-Text \
--datasets_folder <Path with all datasets> \
--resume <Path with the checkpoint>Before running the evaluation script, please follow the steps below to validate the results of our experiments on the Maze-with-Text dataset.
-
Download Checkpoint: Get the checkpoint file.
-
File Placement: Move the downloaded checkpoint file to the designated path:
./checkpoints/.
(Optional) If you want to use an LLM for text-based reranking, please set your own API key in utils/test.py and add --use-llm flag at the end of the command.
Note
Training code will come soon, please stay tuned.
If you find TextInPlace helpful for your research, please consider citing:
@inproceedings{tao2025textinplace,
title={TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and Verification},
author={Tao, Huaqi and Liu, Bingxi and Chen, Calvin and Huang, Tingjun and Li, He and Cui, Jinqiang and Zhang, Hong},
booktitle={2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2025},
organization={IEEE}
}
-
Thanks to these great repositories: Bag-of-Queries, SuperPlace, NYC-Indoor-VPR, Bridging-Text-Spotting, DPText-DETR, DiG and many other inspiring works in the community.
-
Contact:
taohq2024@mail.sustech.edu.cn.
