LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding

This repository contains the official authors implementation associated with the paper "LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding" (Arxiv 2024), which can be found here. We further provide the preprocessed datasets, as well as pre-trained models.

Enviornment

We recommend python=3.10.0, cuda toolkit=12.6 as the base environment.

# SSH
git clone github.com/lifuguan/langsurf.git
cd PGSR

conda create -n langsurf python=3.10
conda activate langsurf

# compile the 3D-GS library
pip install -e submodules/diff-langsurf-rasterization
pip install -e submodules/simple-knn
pip install -e submodules/segment-anything-langsplat

# install other dependencies
pip install -r requirements.txt

Data Preperation

In the experiments section of our paper, we primarily utilized two datasets: the LERF-OVS dataset and the Scannet dataset.

For the LERF-OVS dataset, we expanded upon its existing annotations, which is accessible for download via the following link: GoogleDrive.

For the Scannet dataset, we also provided the corresponding COLMAP data. Full resources can be accessed through this link: GoogleDrive.

DATA Structure

data
  |---lerf_ovs
  |      |---label
  |      |     |--- ramen
  |      |     |--- ...
  |      |---ramen
  |      |     |---images
  |      |     |     |--- ...
  |      |     |---sparse
  |      |     |     |--- ...
  |      |---teatime
  |      |     |--- ...
  |      |---waldo_kitchen
  |      |     |--- ...
  |---scannet
  |      |---scene0085_00
  |      |     |---gt_iou
  |      |     |     |--- ...
  |      |     |---gt_ply
  |      |     |     |--- ...
  |      |     |---images
  |      |     |     |--- ...
  |      |     |---sparse/0
  |      |     |     |--- ...
  |      |---scene0616_00
  |      |     |--- ...

For Scannet Dataset

Here we use the following scripts to convert the GT labeled point cloud into a class-specified format (for 3d evaluation).

python scripts/scannet_ply_converter.py --input_ply {path to the ply file}
# example
python scripts/scannet_ply_converter.py --input_ply data/scannet/scene0085_00/gt_ply/scene0085_00_vh_clean_2.labels.ply

Training

The bash file contains multiple steps, including image preprocess, feature inference, and Gaussian training.

bash train_scene.sh data/lerf_ovs/waldo_kitchen
bash train_scene.sh data/lerf_ovs/ramen
bash train_scene.sh data/lerf_ovs/teatime
bash train_scene.sh data/scannet/scene0085_00
bash train_scene.sh data/scannet/scene0114_02
bash train_scene.sh data/scannet/scene0616_00
bash train_scene.sh data/scannet/scene0617_00

After that, the data structure should be as follows (here we take scene0085_00 in scannet as examples):

data
  |---scannet
  |      |---scene0085_00
  |      |     |---images
  |      |     |     |--- ...
  |      |     |---sparse
  |      |     |     |--- ...
  |      |     |---hcma_features
  |      |     |     |--- ...
  |      |     |---hcma_features_dim3
  |      |     |     |--- ...
  |      |     |---output
  |      |     |     |---scene0085_00_1
  |      |     |     |         |---app_model
  |      |     |     |               |--- ...
  |      |     |     |         |---point_cloud
  |      |     |     |               |--- ...
  |      |     |     |         |---cfg_args
  |      |     |     |         |---chkpnt40000.pth
  |      |     |     |---scene0085_00_2
  |      |     |     |         |--- ...
  |      |     |     |---scene0085_00_3
  |      |     |     |         |--- ...

Evaluation

If you already got trained model (or using our pre-trained model), you can skip the training process and directly render feature by using the following command.

bash render.sh data/lerf_ovs/waldo_kitchen
bash render.sh data/scannet/scene0085_00

Compute Metrics

For LERF-OVS Dataset, use evaluate_lerf_ovs.py to evalute 2D mIoU and 2D localization metrics.

python eval/evaluate_lerf_ovs.py \
         --dataset_name waldo_kitchen \
         --output_dir eval_result

For Scannet Dataset, use evaluate_scannet.py to evalute 2D mIoU and evaluate_scannet_3d.py to produce 3d query pointclouds as well as evalute semantic F1-score.

python eval/evaluate_scannet.py \
         --dataset_name scene0085_00 \
         --output_dir eval_result 

python eval/evaluate_scannet_3d.py \
         --dataset_name scene0085_00 \
         --output_dir eval_result

Render 2D masks

To render 2D segmentation masks and 2D heatmap, use --generate_mask, as shown in the following script.

python eval/render_full.py --dataset_name teatime --output_dir eval_result --generate_mask

To render 3D heatmap, use --heatmap_3d, as shown in the following script.

python eval/render_full.py --dataset_name teatime --output_dir eval_result --heatmap_3d

Downstream Tasks

3D Instance Segmentation

We provide 3D instance segmentation code with the following steps:

Run evel/render_full.py with argument --ins_seg, as shown in the following script. It will automatically generate all the object ply models (i.e. teatime_ins_cookie_0_0.ply, ...).

python eval/render_full.py --dataset_name teatime --output_dir eval_result --ins_seg

    python render.py -m data/lerf_ovs/teatime/output/teatime_1 \
    --include_feature --normalized \
    --ply_path eval_result/teatime/point_cloud/teatime_ins_cookie_0_0.ply
    python render.py -m data/lerf_ovs/teatime/output/teatime_1 \
    --include_feature --normalized \
    --ply_path eval_result/teatime/point_cloud/teatime_ins_cookie_0_1.ply
    python render.py -m data/lerf_ovs/teatime/output/teatime_1 \
    --include_feature --normalized \
    --ply_path eval_result/teatime/point_cloud/teatime_ins_cookie_0_2.ply
    python render.py -m data/lerf_ovs/teatime/output/teatime_1 \
    --include_feature --normalized \
    --ply_path eval_result/teatime/point_cloud/teatime_ins_cookie_0_all.ply

Object Removal

To object removal, use --remove_object, as shown in the following script.

    python eval/render_full.py --dataset_name teatime --output_dir eval_result --remove_object
    python render.py -m data/lerf_ovs/teatime/output/teatime_3 \
    --include_feature --normalized \
    --ply_path 'eval_result/teatime/point_cloud/teatime_remove_food bag_0.ply'

Object Editting and Finetune

    python gs_edit.py --dataset_name scene0617_00
    python train_finetune.py -m data/scannet/scene0617_00/output/scene0617_00_3
    python render.py -m data/scannet/scene0617_00/output/scene0617_00_3 \
    --include_feature --normalized \
    --ply_path data/scannet/scene0617_00/output/scene0617_00_3/finetune/point_cloud/iteration_43000/finetune.ply

Object Adding

    python add.py --dataset_name waldo_kitchen --input_object_ply 'eval_result/teatime/point_cloud/teatime_food bag_0.ply'
    python render.py -m data/lerf_ovs/waldo_kitchen/output/waldo_kitchen_3 \
    --include_feature --normalized \
    --ply_path 'data/lerf_ovs/waldo_kitchen/output/waldo_kitchen_3/point_cloud/iteration_40000/add_teatime_food bag_0.ply'

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
arguments		arguments
assets		assets
autoencoder		autoencoder
eval		eval
gaussian_renderer		gaussian_renderer
lpipsPyTorch		lpipsPyTorch
scene		scene
scripts		scripts
submodules		submodules
utils		utils
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
add.py		add.py
gs_edit.py		gs_edit.py
preprocess_hcma.py		preprocess_hcma.py
render.py		render.py
render.sh		render.sh
requirements.txt		requirements.txt
train.py		train.py
train_finetune.py		train_finetune.py
train_scene.sh		train_scene.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding

Enviornment

Data Preperation

DATA Structure

For Scannet Dataset

Training

Evaluation

Compute Metrics

Render 2D masks

Downstream Tasks

3D Instance Segmentation

Object Removal

Object Editting and Finetune

Object Adding

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

lifuguan/LangSurf

Folders and files

Latest commit

History

Repository files navigation

LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding

Enviornment

Data Preperation

DATA Structure

For Scannet Dataset

Training

Evaluation

Compute Metrics

Render 2D masks

Downstream Tasks

3D Instance Segmentation

Object Removal

Object Editting and Finetune

Object Adding

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages