Skip to content

tsinghua-fib-lab/EmbodiedCity

Repository files navigation

Embodied City

News 🎉

[2025.05.22] The simulator can be downloaded from: link.

[2024.10.12] We release the full paper on arXiv: link.

1 Introduction 🌟

Embodied intelligence is considered one of the most promising directions for artificial general intelligence, with human-like abilities interacting with the world. However, most works focus on bounded indoor environments, with limited literature on open-world scenarios. To address this, we release a new benchmark platform, named Embodied City, for embodied intelligence in urban environments. This platform includes a simulator and datasets on representative tasks for embodied intelligence in an urban environment. You can either request the API key to access the online deployed environment, or download it to deploy in your own server.

2 Simulator 🌆

We construct an environment where the agents🤖 can perceive, reason, and take actions. The basic environment of the simulator includes a large business district in Beijing, one of the biggest city in China, in which we build 3D model for buildings, streets, and other elements, hosted by Unreal Engine.

2.1 Buildings

We first manually use Blender4 to create the 3D asserts of the buildings, for which we use the streetview services of Baidu Map and Amap. The city level detail includes a variety of building types such as office towers🏢, shopping malls🏬, residential complexes🏠, and public facilities🏫. These models are textured and detailed to closely resemble their real-world counterparts to enhance realism in the simulation.

2.2 Streets

The streets are modeled to include all necessary components such as lanes🛣️, intersections❌, traffic signals🚦, and road markings⬆️. We also incorporate pedestrian pathways, cycling lanes, and parking areas. Data from traffic monitoring systems and mapping services help ensure that the street layout and traffic flow patterns are accurate and realistic.

2.3 Other Elements

Other elements include street furniture🚸 (benches, streetlights, signs) , vegetation🌳 (trees, shrubs, lawns), and urban amenities🚉 (bus stops, metro-entrances, public restrooms). These are also created using Blender, based on real-world references from the street view services mentioned above. Additionally, dynamic elements like vehicles🚗 and pedestrians🚶 are simulated to move realistically within the environment, contributing to the liveliness and accuracy of the urban simulation. The simulation algorithms of vehicles and pedestrians are based on Mirage Simulation System.

3 Embodied Task 📋

3.1 Environment

Download and extract the full embodiedcity simulator of offline version. Users can download the offline simulation environment for local deployment to train and test agents. The platform provides versions compatible with Windows system, enabling quick deployment and testing.

For simulator download links, please refer to: link

Then the Python environment can be installed as follows:

conda env create -n EmbodiedCity -f environment.yml
conda activate EmbodiedCity

or

conda create -n EmbodiedCity python=3.10
conda activate EmbodiedCity
pip install -r requirements.txt

To verify the environment setup for AirSim, please follow the steps below:

Set AirSim to 'Multirotor' Mode: Open the AirSim settings file (settings.json) and ensure that the SimMode is set to 'Multirotor'. For example:

{
    "SeeDocsAt": "https://github.com/Microsoft/AirSim/blob/master/docs/settings.md",
    "SettingsVersion": 1.2,
    "SimMode": "Multirotor"
}

Run the Test Script: Execute the airsim_test.py script provided in this folder. You can run the script using the following command:

python airsim_test.py

Verify the Output: If the environment is configured correctly:

  • You should observe the drone taking off and flying upward in the AirSim simulation.
  • The script will capture RGB observations from the drone's front-facing camera.
  • The captured images will be saved in the current folder.

If all the above steps work as expected, your AirSim environment is successfully configured.

3.2 Running

We provides an example of a Vision-Language Navigation (VLN) task implementation.

Files and Directories
Usage

In embodied_vln.py, the VLN_evaluator class is defined. You need to provide the dataset path, the model to be evaluated, and the corresponding API key.

Set up model and API key:
model = "xxxxx"  # LM models, e.g., "claude-3-haiku-20240307", "gpt-4o"
api_key = "xxxxxxxxx"  # Fill in your API key
Initialize the VLN evaluator:
vln_eval = VLN_evaluator("Datasets/vln", model, api_key)
Run the evaluation:
vln_eval.evaluation()

We support multimodal models from OpenAI and Claude. If you wish to use a custom model, you can modify the LM_VLN class in utils.py.

The evaluation process will activate the simulator's drone, running through each VLN task sample. The performance of the model will be quantified using the following three metrics:

  • Success Rate (SR) measures the proportion of navigation episodes where the agent successfully reaches the target location within a specified margin of error
  • SPL (Success weighted by Path Length) is a metric that considers both the success rate and the efficiency of the path taken by the agent. It accounts for how closely the agent's path length matches the optimal path length.
  • Navigation Error (NE) measures the average distance from the agent's final location to the target destination.

3.3 Task Definition

Embodied Action, often referred to as Vision-and-Language Navigation (VLN), is a research area in artificial intelligence that focuses on enabling an agent to navigate an environment based on natural language instructions. The input combines visual perception and natural language instructions to guide the agent through complex environments. The output is the action sequences following the language instructions.

4 Citation 📝

Please cite our paper if you find EmbodiedCity helpful in your research.

@article{gao2024embodied,
  title={EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment},
  author={Gao, Chen and Zhao, Baining and Zhang, Weichen and Zhang, Jun and Mao, Jinzhu and Zheng, Zhiheng and Man, Fanhang and Fang, Jianjie and Zhou, Zile and Cui, Jinqiang and Chen, Xinlei and Li, Yong},
  journal={arXiv preprint},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages