Embodied City

News 🎉

[2025.05.22] The simulator can be downloaded from: link.

[2024.10.12] We release the full paper on arXiv: link.

1 Introduction 🌟

Embodied intelligence is considered one of the most promising directions for artificial general intelligence, with human-like abilities interacting with the world. However, most works focus on bounded indoor environments, with limited literature on open-world scenarios. To address this, we release a new benchmark platform, named Embodied City, for embodied intelligence in urban environments. This platform includes a simulator and datasets on representative tasks for embodied intelligence in an urban environment. You can either request the API key to access the online deployed environment, or download it to deploy in your own server.

2 Simulator 🌆

2.1 Introduction

We construct an environment where the agents🤖 can perceive, reason, and take actions. The basic environment of the simulator includes a large business district in Beijing, one of the biggest city in China, in which we build 3D model for buildings, streets, and other elements, hosted by Unreal Engine.

2.1.1 Buildings

We first manually use Blender4 to create the 3D asserts of the buildings, for which we use the streetview services of Baidu Map and Amap. The city level detail includes a variety of building types such as office towers🏢, shopping malls🏬, residential complexes🏠, and public facilities🏫. These models are textured and detailed to closely resemble their real-world counterparts to enhance realism in the simulation.

2.1.2 Streets

The streets are modeled to include all necessary components such as lanes🛣️, intersections❌, traffic signals🚦, and road markings⬆️. We also incorporate pedestrian pathways, cycling lanes, and parking areas. Data from traffic monitoring systems and mapping services help ensure that the street layout and traffic flow patterns are accurate and realistic.

2.1.3 Other Elements

Other elements include street furniture🚸 (benches, streetlights, signs) , vegetation🌳 (trees, shrubs, lawns), and urban amenities🚉 (bus stops, metro-entrances, public restrooms). These are also created using Blender, based on real-world references from the street view services mentioned above. Additionally, dynamic elements like vehicles🚗 and pedestrians🚶 are simulated to move realistically within the environment, contributing to the liveliness and accuracy of the urban simulation. The simulation algorithms of vehicles and pedestrians are based on Mirage Simulation System.

2.2 Manual Control

keyboard_control.py allows you to control the drone using a keyboard. To use it, ensure the pop-up window is in focus and the input method is set to English. The key mappings are as follows:

↑: Move forward
↓: Move backward
←: Move left
→: Move right
W: Move up
S: Move down
A: Rotate left
D: Rotate right

This script can be combined with other AirSim functions to achieve additional capabilities.

2.3 Starting Position

You can use AirSim's simSetVehiclePose function to teleport the vehicle to a meaningful starting position. The following command demonstrates this:

target_position = airsim.Vector3r(7481.66602, -3555.18677, -53.36726)
client.simSetVehiclePose(airsim.Pose(target_position, airsim.Quaternionr(0, 0, 0, 1)), True)

3 Embodied Task 📋

3.1 Environment

Download and extract the full embodiedcity simulator of offline version. Users can download the offline simulation environment for local deployment to train and test agents. The platform provides versions compatible with Windows system, enabling quick deployment and testing.

For simulator download links, please refer to: link

Then the Python environment can be installed as follows:

conda env create -n EmbodiedCity -f environment.yml
conda activate EmbodiedCity

or

conda create -n EmbodiedCity python=3.10
conda activate EmbodiedCity
pip install -r requirements.txt

To verify the environment setup for AirSim, please follow the steps below:

Set AirSim to 'Multirotor' Mode: Open the AirSim settings file (settings.json) and ensure that the SimMode is set to 'Multirotor'. For example:

{
    "SeeDocsAt": "https://github.com/Microsoft/AirSim/blob/master/docs/settings.md",
    "SettingsVersion": 1.2,
    "SimMode": "Multirotor"
}

Run the Test Script: Execute the airsim_test.py script provided in this folder. You can run the script using the following command:

python airsim_test.py

Verify the Output: If the environment is configured correctly:

You should observe the drone taking off and flying upward in the AirSim simulation.
The script will capture RGB observations from the drone's front-facing camera.
The captured images will be saved in the current folder.

If all the above steps work as expected, your AirSim environment is successfully configured.

3.2 Running

We provides an example of a Vision-Language Navigation (VLN) task implementation.

Files and Directories

Code: The main code for the VLN task is located in embodied_vln.py.
Dataset: The corresponding dataset files are located in:
- Datasets/vln/start_loc.txt - Defines the starting locations and instructions for each VLN task sample.
- Datasets/vln/label - Contains the ground truth trajectories.

Usage

In embodied_vln.py, the VLN_evaluator class is defined. You need to provide the dataset path, the model to be evaluated, and the corresponding API key.

Set up model and API key:

model = "xxxxx"  # LM models, e.g., "claude-3-haiku-20240307", "gpt-4o"
api_key = "xxxxxxxxx"  # Fill in your API key

Initialize the VLN evaluator:

vln_eval = VLN_evaluator("Datasets/vln", model, api_key)

Run the evaluation:

vln_eval.evaluation()

We support multimodal models from OpenAI and Claude. If you wish to use a custom model, you can modify the LM_VLN class in utils.py.

The evaluation process will activate the simulator's drone, running through each VLN task sample. The performance of the model will be quantified using the following three metrics:

Success Rate (SR) measures the proportion of navigation episodes where the agent successfully reaches the target location within a specified margin of error
SPL (Success weighted by Path Length) is a metric that considers both the success rate and the efficiency of the path taken by the agent. It accounts for how closely the agent's path length matches the optimal path length.
Navigation Error (NE) measures the average distance from the agent's final location to the target destination.

3.3 Task Definition

Embodied Action, often referred to as Vision-and-Language Navigation (VLN), is a research area in artificial intelligence that focuses on enabling an agent to navigate an environment based on natural language instructions. The input combines visual perception and natural language instructions to guide the agent through complex environments. The output is the action sequences following the language instructions.

4 Citation 📝

Please cite our paper if you find EmbodiedCity helpful in your research.

@article{gao2024embodied,
  title={EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment},
  author={Gao, Chen and Zhao, Baining and Zhang, Weichen and Zhang, Jun and Mao, Jinzhu and Zheng, Zhiheng and Man, Fanhang and Fang, Jianjie and Zhou, Zile and Cui, Jinqiang and Chen, Xinlei and Li, Yong},
  journal={arXiv preprint},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
Datasets		Datasets
__pycache__		__pycache__
embodiedcity		embodiedcity
imgs		imgs
prompts		prompts
vln		vln
API.py		API.py
Benchmark.png		Benchmark.png
README.md		README.md
Simulator.png		Simulator.png
airsim_test.py		airsim_test.py
embodied_tasks.py		embodied_tasks.py
embodied_vln.py		embodied_vln.py
environment.yml		environment.yml
keyboard_control.py		keyboard_control.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Embodied City

News 🎉

1 Introduction 🌟

2 Simulator 🌆

2.1 Introduction

2.1.1 Buildings

2.1.2 Streets

2.1.3 Other Elements

2.2 Manual Control

2.3 Starting Position

3 Embodied Task 📋

3.1 Environment

3.2 Running

Files and Directories

Usage

Set up model and API key:

Initialize the VLN evaluator:

Run the evaluation:

3.3 Task Definition

4 Citation 📝

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

tsinghua-fib-lab/EmbodiedCity

Folders and files

Latest commit

History

Repository files navigation

Embodied City

News 🎉

1 Introduction 🌟

2 Simulator 🌆

2.1 Introduction

2.1.1 Buildings

2.1.2 Streets

2.1.3 Other Elements

2.2 Manual Control

2.3 Starting Position

3 Embodied Task 📋

3.1 Environment

3.2 Running

Files and Directories

Usage

Set up model and API key:

Initialize the VLN evaluator:

Run the evaluation:

3.3 Task Definition

4 Citation 📝

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages