Zhaoyang Lv, Maurizio Monge, Ka Chen, Yufeng Zhu, Michael Goesele, Jakob Engel, Zhao Dong, Richard Newcombe
Reality Labs Research, Meta
ACM SIGGRAPH Conference 2025
This repository focuses on reconstructing photorealistic 3D scenes captured from an egocentric device. In contrast to off-the-shelf Gaussian Splatting reconstruction pipelines that use videos as input from structure-from-motion, we highlight two major innovations that are crucial for improving the reconstruction quality:
-
Visual-inertial bundle adjustment (VIBA): Unlike the mainstream approach of treating an RGB camera as a frame-rate camera, VIBA allows us to calibrate the precise timestamps and movements of an RGB camera in a high-frequency trajectory format. This supports the system in precisely modeling the online RGB camera calibrations and the pixel movements of a rolling-shutter camera.
-
Gaussian Splatting model: We incorporate a physical image formation model based on the Gaussian Splatting algorithm, which effectively addresses sensor characteristics, including the rolling-shutter effect of RGB cameras and the dynamic ranges measured by the sensors. This formulation is general to other variants of rasterization-based techniques.
In this repository, we provide comprehensive guidelines for using the data recorded by the Aria Gen 1 device. We acquire the VIBA input from the machine perception services provided by the Project Aria platform. Below, we offer detailed guidance on preprocessing the recordings and reconstructing them using several major variants of the Gaussian Splatting algorithms. In addition to reconstructing scenes using RGB sensors, we also provide examples of using SLAM cameras or combining all cameras together.
@inproceedings{lv2025egosplats,
title={Photoreal Scene Reconstruction from an Egocentric Device},
author={Lv, Zhaoyang and Monge, Maurizio and Chen, Ka and Zhu, Yufeng and Goesele, Michael and Engel, Jakob and Dong, Zhao and Newcombe, Richard},
booktitle={ACM SIGGRAPH}
year={2025}
}
conda create -n ego_splats python=3.10
conda activate ego_splats
# Install pytorch (tested version). Choose a version that is compatible with your system
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txtRegister on project aria dataset and get your download file at Photoreal Reconstruction Project Aria.
# the path to the downloadable cdn file
DOWNLOAD_CDN_FILE=AriaScenes_download_urls.json
# download one of the exemplar sequence (this will take about 16GB disk space)
python scripts/downloader.py --cdn_file $DOWNLOAD_CDN_FILE -o data/aria_scenes --sequence_names livingroom
# download all the sequences (this will take about 114GB disk space)
python scripts/downloader.py --cdn_file $DOWNLOAD_CDN_FILE -o data/aria_scenesYou can browse the scene using the Aria Scene Explorer.
We provided an exemplar script to preprocess the Aria VRS recording with the machine perception tool (with input of the semidense point cloud, closed loop trajectories and the online calibration files). Assume you have the examplar scene "livingroom" downloaded according to the previous step in the "data/aria_scenes" path, you can run
bash scripts/bash_local/run_vrs_preprocessing.shFor more details that happened during the preprocessing, check Preprocess Aria video
We provided an exemplar training script following the above preprocessing script. This is the standard setting we used in the paper.
# Run 3D GS reconstruction using RGB camera only
bash scripts/bash_local/run_aria_rgb_camera.shDuring training, the model wil launch an online visualizer at "http://0.0.0.0:8080". Open browser to check the reconstruction results interactively.
In addition, we also provided a few settings that we did not use in the paper, but leveraged the capabilities of Aria videos, including using the SLAM (monochrome) camera inputs, using multi-modal cameras inputs, and other variants of Gaussian Splatting algorithm.
We can run the same reconstruction process on the SLAM cameras, which are global shutter monochrome cameras. There are two of them on Project Aria Gen1 devices, which offer better field of view coverage (with limited overlap between them though). For certain applications, e.g. geometry reconstruction, this may offer more advantage over the RGB cameras.
# Run 3D GS reconstruction using (two) SLAM cameras only
# --train_model: choose 3dgs or 2dgs.
# --strategy: default or MCMC.
# example:
bash scripts/bash_local/run_aria_slam_camera.sh --train_model 3dgs --strategy default
# or using 2d-gs
bash scripts/bash_local/run_aria_slam_camera.sh --train_model 2dgs --strategy defaultIn addition, we can combine the RGB camera and SLAM cameras jointly in the reconstruction. We will reconstruct a RGBM radiance field with shared geometry structure.
# has not been checked-in or tested
# --train_model: choose 3dgs or 2dgs.
# --strategy: default or MCMC.
bash scripts/local/run_aria_all_cameras.shNote: with fixed number of training iterations, this does not necessarily offer better view synthesis results over RGB or SLAM channel quantitatively, but you may find it provides reconstruction with less floaters by using all the cameras from different views.
We provide an interactive viewer to visualize the trained models. For example, after launching the training scripts above, you can visualize all the models using
python launch_viewer.py model_root=output/recording/camera-rgb-rectified-1200-h2000/In default, it will show the visualizer at "http://0.0.0.0:8080". Open browser to check the results interactively.
We provide an example script to render the video from the trained model. Check the script for more details.
bash scripts/bash_local/run_aria_render.sh Please refer to Project Aria Docs and Aria Research Kit for more details on capturing videos, running machine perception services to get the calibration and location metadata.
To capture the videos, we used the Profile 31 which supports full resolution RGB camera with maximum exposure capped at 3ms. For outdoor scenes, we should generally support all variants of profiles. For indoor videos, this might lead to relatively darker video input if the scene is not sufficiently illuminated, but you may have a chance to recover the full dynamic range of the scene after reconstruction. If you have questions about how to get the best practice for a specific scenario, feel free to make an issue request and we will be happy to help providing some inputs.
This implementation is Creative Commons licensed, as found in the LICENSE file.
The work built in this repository benefits from the great work in the following open-source projects:
- Project Aria tool: Apache 2.0
- EgoLifter, Apache 2.0
- gsplats, Apache 2.0
- viser, Apache 2.0
- nerfview, Apache 2.0