Skip to content

Official code for IROS 2025 paper "TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and Verification"

Notifications You must be signed in to change notification settings

HqiTao/TextInPlace

Repository files navigation

TextInPlace

🌟 If this work is useful to you, please give this repository a Star! 🌟

News | Usage | Citation | Acknowledgement

This is the official repository for the paper:

TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and Verification

Abstract Visual Place Recognition (VPR) is a crucial capability for long-term autonomous robots, enabling them to identify previously visited locations using visual information. However, existing methods remain limited in indoor settings due to the highly repetitive structures inherent in such environments. We observe that scene text typically appears in indoor spaces, serving to distinguish visually similar but different places. This inspires us to propose TextInPlace, a simple yet effective VPR framework that integrates Scene Text Spotting (STS) to mitigate visual perceptual ambiguity in repetitive indoor environments. Specifically, TextInPlace adopts a dual-branch architecture within a local parameter sharing network. The VPR branch employs attention-based aggregation to extract global descriptors for coarse-grained retrieval, while the STS branch utilizes a bridging text spotter to detect and recognize scene text. Finally, the discriminative text is filtered to compute text similarity and re-rank the top-K retrieved images. To bridge the gap between current text-based repetitive indoor scene datasets and the typical scenarios encountered in robot navigation, we establish an indoor VPR benchmark dataset, called Maze-with-Text. Extensive experiments on both custom and public datasets demonstrate that TextInPlace achieves superior performance over existing methods that rely solely on appearance information.

Overview

News

  • 2025-06-23: The code for TextInPlace is publicly available in this repository📦!

  • 2025-06-16: TextInPlace is accepted by IROS 2025. 🎉🎉🎉

Usage

Environment Setup

You can create your own conda environment for TextInPlace based on the following command⚙️:

conda create -n stloc python=3.10 -y
conda activate stloc
pip install torch==2.2.0+cu121 torchvision==0.17.0+cu121 --index-url https://download.pytorch.org/whl/cu121
cd detectron2
pip install -e . && cd ..
pip install -r requirements.txt
python setup.py build develop

Maze-with-Text Dataset

To bridge the gap between current text-based repetitive indoor scene datasets and the typical scenarios encountered in robot navigation, we establish an indoor VPR benchmark dataset, called Maze-with-Text. The number of images in the Maze-with-Text dataset is as follows:

Floor 1 2 3 4 5 All
Queries 280 253 258 245 269 1305
Database 1368 2268 1588 1720 1596 8540

You can download Maze-with-Text dataset from Google Drive. After downloading, please unzip the archive and organize the dataset into the following directory structure:

|-- Maze-with-Text
    |-- images
        |-- test
            |-- database
            |   |-- @-00.0027@038.6324@5@339@0@.jpg
                ......
            |-- queries
            |   |-- @-00.0012@032.7272@5@65@3@.jpg
                ......

Evaluation

Evaluation script

python -W ignore eval.py --backbone ResNet50 --aggregation boq \
  --features_dim 16384 \
  --infer_batch_size 64 \
  --config-file configs/Bridge/TotalText/R_50_poly.yaml \
  --dataset_name Maze-with-Text \
  --datasets_folder <Path with all datasets> \
  --resume <Path with the checkpoint>

Before running the evaluation script, please follow the steps below to validate the results of our experiments on the Maze-with-Text dataset.

  • Download Checkpoint: Get the checkpoint file.

  • File Placement: Move the downloaded checkpoint file to the designated path: ./checkpoints/.

(Optional) If you want to use an LLM for text-based reranking, please set your own API key in utils/test.py and add --use-llm flag at the end of the command.

Training

Note

Training code will come soon, please stay tuned.

Citation

If you find TextInPlace helpful for your research, please consider citing:

@inproceedings{tao2025textinplace,
  title={TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and Verification},
  author={Tao, Huaqi and Liu, Bingxi and Chen, Calvin and Huang, Tingjun and Li, He and Cui, Jinqiang and Zhang, Hong},
  booktitle={2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2025},
  organization={IEEE}
}

Acknowledgement

About

Official code for IROS 2025 paper "TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and Verification"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published