CityNavAgent

Official repo for "CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory"

💡 Introduction

CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory

Aerial vision-and-language navigation (VLN) — requiring drones to interpret natural language instructions and navigate complex urban environments — emerges as a critical embodied AI challenge that bridges human-robot interaction, 3D spatial reasoning, and real-world deployment. Although existing ground VLN agents achieved notable results in indoor and outdoor settings, they struggle in aerial VLN due to the absence of predefined navigation graphs and the exponentially expanding action space in long-horizon exploration. In this work, we propose \textbf{CityNavAgent}, a large language model (LLM)-empowered agent that significantly reduces the navigation complexity for urban aerial VLN. Specifically, we design a hierarchical semantic planning module (HSPM) that decomposes the long-horizon task into sub-goals with different semantic levels. The agent reaches the target progressively by achieving sub-goals with different capacities of the LLM. Additionally, a global memory module storing historical trajectories into a topological graph is developed to simplify navigation for visited targets. Extensive benchmark experiments show that our method achieves state-of-the-art performance with significant improvement. Further experiments demonstrate the effectiveness of different modules of CityNavAgent for aerial VLN in continuous city environments.

AirVLN-E Dataset

The annotations of the enriched AirVLN-E dataset can be downloaded here. The simulator can be downloaded from AirVLN

Getting Started

Prerequisites

Python 3.8
Conda Environment

Installation

  # clone the repo
  git clone https://github.com/WeichenZh/CityNavAgent.git
  cd CityNavAgent-main

  # Create a virtual environment
  conda create -n citynavagent python=3.8
  conda activate citynavagent

  # install dependencies with pip
  pip install -r requirements.txt

The project directory structure is similar to AirVLN, which should be like this:

Project_dir/
├── CityNavAgent-main/
├── DATA/
│   ├── data
│   │   ├── aerialvln-s
│   │   ├── aerialvln-e
├── ENVs/
│   ├── env_1
│   ├── ...

Setup

Dependencies Installation

GroundingSAM

Install GroundingSAM following the official instructions at Grounded-Segment-Anything. Then, place the GroundingSAM project under ./external directory. Download SAM and GroundingDino weights at sam_vit_h and swint_ogc.

The ./external directory structure is like this:

external/
├── Grounded_Sam_Lite/
│   ├── grondingdino
│   ├── segment_anything
|   ├── weights

LM-Nav

Install LM-Nav requirements following the official instructions at lm_nav.

The ./external directory structure is like this:

external/
├── Grounded_Sam_Lite/
│   ├── grondingdino
│   ├── segment_anything
|   ├── weights
|   ├── grounded_sam_api.py
├── lm_nav/
│   ├── landmark_extraction.py
│   ├── navigation_graph.py
|   ├── optimal_route.py
|   ├── pipeline.py
|   ├── utils_lm.py

OpenAI API KEY

Set your OpenAI API KEY in SimRun.py

Data Preparation

Download memory graphs and AirVLN data at here. And put the data under the ./data directory.

The directory should be like this:

CityNavAgent/
├── data/
│   ├── gt_by_env
│   ├── mem_graphs
|   ├── mem_graphs_pruned

Inference

Run SimRun.py :

python SimRun.py --Image_Width_RGB 512 --Image_Height_RGB 512 --Image_Width_DEPTH 512 --Image_Height_DEPTH 512

🙏 Acknowledgement

We have used code snippets from different repositories, especially from: AirVLN, LM_NAV, GroundingDino, and SAM. We would like to acknowledge and thank the authors of these repositories for their excellent work.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
airsim_plugin		airsim_plugin
config		config
evaluator		evaluator
external		external
src		src
utils		utils
.gitignore		.gitignore
README.md		README.md
SimRun.py		SimRun.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CityNavAgent

💡 Introduction

AirVLN-E Dataset

Getting Started

Prerequisites

Installation

Setup

Dependencies Installation

GroundingSAM

LM-Nav

OpenAI API KEY

Data Preparation

Inference

🙏 Acknowledgement

About

Uh oh!

Releases

Packages

Languages

EmbodiedCity/CityNavAgent.code

Folders and files

Latest commit

History

Repository files navigation

CityNavAgent

💡 Introduction

AirVLN-E Dataset

Getting Started

Prerequisites

Installation

Setup

Dependencies Installation

GroundingSAM

LM-Nav

OpenAI API KEY

Data Preparation

Inference

🙏 Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages