PixCuboid

We introduce PixCuboid, an optimization-based approach for cuboid-shaped room layout estimation, which is based on multi-view alignment of dense deep features.

This repository contains the official implementation of the paper PixCuboid: Room Layout Estimation from Multi-view Featuremetric Alignment, to be presented at the ICCV 2025 Workshop on Large Scale Cross Device Localization.

Project page: https://ghanning.github.io/PixCuboid/

Code base

PixCuboid is built upon the excellent PixLoc code base. The PixLoc master branch is available in this repository under the name pixloc.

Installation

Install PixCuboid in editable mode as follows:

git clone https://github.com/ghanning/PixCuboid.git
cd PixCuboid/
virtualenv venv
source venv/bin/activate
pip install -e .

Running the demo notebooks requires some extra dependencies that can be installed with:

pip install -e .[extra]

Datasets

Download the ScanNet++ and 2D-3D-Semantics datasets from their respective web sites and unpack into a subdirectory named "datasets". The expected directory structure is shown below.

.
└── datasets
    ├── 2d3ds
    │   ├── area_1
    │   ├── area_2
    │   ├── area_3
    │   ├── area_4
    │   ├── area_5a
    │   ├── area_5b
    │   └── area_6
    └── scannetpp
        ├── data
        ├── metadata
        └── splits

Note: We only use ScanNet++ to train PixCuboid, but provide code to run the room layout estimation also on 2D-3D-Semantics.

Preprocessing

Undistorted DSLR images

~~Use the ScanNet++ Toolbox to undistort the DSLR fisheye images by following the instructions here.~~

Note: As of April 30, 2025, undistorted DSLR images are included in the ScanNet++ dataset and this step can thus be skipped.

Depth maps

Render depth maps for the undistorted DSLR images using the render-undistorted branch in my fork of the ScanNet++ Toolbox as described here, but set render_undistorted to True.

2D-3D correspondences

Run our preprocessing script to find the 2D-3D point correspondences used in training:

python -m pixloc.pixlib.preprocess_scannetpp

Perspective images for 2D-3D-Semantics (optional)

Split the panorama images into perspective views as detailed here.

Line segments (optional)

While line segments are not required to train PixCuboid they improve its performance at inference time. To extract line segments with DeepLSD first install it with

pip install -e .[deeplsd]

then download the pre-trained weights

mkdir weights
wget https://cvg-data.inf.ethz.ch/DeepLSD/deeplsd_md.tar -O weights/deeplsd_md.tar

and run the extraction for ScanNet++ and 2D-3D-Semantics:

./scripts/line_segments_scannetpp.sh
./scripts/line_segments_2d3ds.sh

Alternatively, you can download the line segments for ScanNet++ from here (665 MiB) and unpack them with the command

unzip line_segments_scannetpp.zip -d datasets/scannetpp

Similarly, the line segments for 2D-3D-Semantics are available here (8 MiB). Unzip with

unzip line_segments_2d3ds.zip -d datasets/2d3ds

Training

Training is done in two stages. First the edge detector is pre-trained by running:

python -m pixloc.pixlib.train --conf pixloc/pixlib/configs/pretrain_pixcuboid_scannetpp.yaml pixcuboid_scannetpp_pretrain

Next the full network is trained, with weights initialized from the previous stage:

python -m pixloc.pixlib.train --conf pixloc/pixlib/configs/train_pixcuboid_scannetpp.yaml pixcuboid_scannetpp train.load_experiment=pixcuboid_scannetpp_pretrain

Tip: Pass the --wandb_project <PROJECT> argument to the training script to log the results to Weights & Biases.

Evaluation

We supply a script to run PixCuboid on each image tuple (ScanNet++) or space (2D-3D-Semantics) and output the room layout predictions to a JSON file.

ScanNet++

python -m pixloc.run_PixCuboid --experiment pixcuboid_scannetpp --conf pixloc/pixlib/configs/eval_pixcuboid_scannetpp.yaml --split {train,val,test} --output OUTPUT

2D-3D-Semantics

python -m pixloc.run_PixCuboid --experiment pixcuboid_scannetpp --conf pixloc/pixlib/configs/eval_pixcuboid_2d3ds.yaml --split test --output OUTPUT

The resulting predictions can be evaluated using the code in the MultiViewCuboid repository.

Pre-trained weights

Pre-trained weights for a model trained on ScanNet++ as outlined above can be found here (317 MiB). Extract the checkpoint with

mkdir -p outputs/training && unzip pixcuboid_scannetpp.zip -d outputs/training

Demo

Try out PixCuboid on ScanNet++ and 2D-3D-Semantic with the Jupyter notebook demo_PixCuboid.ipynb.

We show how the method can be applied to your own data (e.g. a set of images from a COLMAP reconstruction) in the notebook PixCuboid_COLMAP.ipynb.

BibTex citation

Use the BibTeX reference below to cite our work.

@inproceedings{hanning2025pixcuboid,
  title={{PixCuboid: Room Layout Estimation from Multi-view Featuremetric Alignment}},
  author={Hanning, Gustav and Åström, Kalle and Larsson, Viktor},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
  year={2025},
}

In addition, please consider citing the PixLoc paper:

@inproceedings{sarlin21pixloc,
  title={{Back to the Feature: Learning Robust Camera Localization from Pixels to Pose}},
  author={Paul-Edouard Sarlin and Ajaykumar Unagar and Måns Larsson and Hugo Germain and Carl Toft and Viktor Larsson and Marc Pollefeys and Vincent Lepetit and Lars Hammarstrand and Fredrik Kahl and Torsten Sattler},
  booktitle={CVPR},
  year={2021},
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
assets		assets
notebooks		notebooks
pixloc		pixloc
scripts		scripts
viewer		viewer
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PixCuboid

Code base

Installation

Datasets

Preprocessing

Undistorted DSLR images

Depth maps

2D-3D correspondences

Perspective images for 2D-3D-Semantics (optional)

Line segments (optional)

Training

Evaluation

ScanNet++

2D-3D-Semantics

Pre-trained weights

Demo

BibTex citation

About

Uh oh!

Contributors 2

Uh oh!

Languages

License

ghanning/PixCuboid

Folders and files

Latest commit

History

Repository files navigation

PixCuboid

Code base

Installation

Datasets

Preprocessing

Undistorted DSLR images

Depth maps

2D-3D correspondences

Perspective images for 2D-3D-Semantics (optional)

Line segments (optional)

Training

Evaluation

ScanNet++

2D-3D-Semantics

Pre-trained weights

Demo

BibTex citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages