RoMaP: Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling
[ICCV 2025 Paper] Hayeon Kim*, Ji Ha Jang*, Se Young Chun† Seoul National University *Equal contribution, †Corresponding author
📄 Paper (arXiv) | 📽️ Project Page | 🔁 BibTeX
RoMaP is a novel framework for fine-grained part-level editing of 3D Gaussian Splatting (3DGS), enabling edit instructions like:
"Turn his left eye blue and his right eye green" "Replace the nose with a croissant" "Give the hair a flame-texture style"
Unlike existing baselines which struggle with local edits due to inconsistent 2D segmentations and weak SDS guidance, RoMaP combines:
- ✅ Geometry-aware segmentation
(3D-GALP)
- ✅ Gaussian prior removal and local masking
- ✅ Regularized SDS with
SLaMP
(Scheduled Latent Mixing and Part Editing) image supervision
- 3D-GALP: Robust 3D segmentation based on spherical harmonics-aware label prediction
- SLaMP editing: Generates realistic part-edited 2D views to direct SDS in target-driven directions (Coming soon)
- Regularized SDS Loss: Anchored L1 + mask support + strong target control (Coming soon)
Ensure your CUDA version ≥ 11.8 and PyTorch ≥ 2.1.0 is installed.
git clone https://github.com/janeyeon/RoMaP.git
cd RoMaP
conda create -n romap python=3.9
conda activate romap
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install ninja -U
pip install -r requirements.txt
cd gaussiansplatting/submodules/diff_gaussian_rasterization
pip install -e .
cd ../
git clone https://github.com/camenduru/simple-knn.git
cd simple-knn
pip install -e .
cd ../../../lgm/diff_gaussian_rasterization_lgm
pip install -e .
cd ../../
# pytorch3d
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
# tiny-cuda-nn (Torch bindings)
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
# nvdiffrast
pip install git+https://github.com/NVlabs/nvdiffrast
# nerfacc (by nerfstudio)
pip install git+https://github.com/nerfstudio-project/nerfacc
# PyTorch Lightning
conda install pytorch-lightning -c conda-forge
# libigl Python bindings
conda install -c conda-forge pyigl
pip install rembg onnxruntime einops trimesh wandb segmentation_refinement tyro roma xformers==0.0.23 imageio[ffmpeg] imageio[pyav] plyfile lightning sentencepiece
You can download the datasets from this link. This project utilizes the NeRF-Art dataset, the 3D-OVS dataset, as well as a custom 3D Gaussian Splatting dataset created by us.
sh run_recon_nerfart_yanan_seg.sh
prompt
: The main prompt for segmentation or editing, describing the target object.seg_list
: A list of words specifying the parts you want to segment; it is recommended to list smaller (more specific) regions first, then larger ones. For compound words, you can include multiple terms in parentheses (e.g.,['sharp','eyes']
).if_segment
: Set toTrue
if you want segmentation to be performed on the scene.ply_path
: A path or list of paths to pretrained PLY files you wish to use for initialization or further processing.seg_softmax_list
: For fine-grained control, adjust softmax values here. In most cases, values between 0.1 and 0.2 yield good results. For segmenting larger regions, consider increasing this value.if_recon
: Set toTrue
if you are working with a reconstruction scene.rot_name
: If you wish to apply a custom camera matrix transformation outside of the defaulttransformation.json
, add a new entry inrotation_dict
withinthreestudio/data/multiview.py
and specify its name here. If not specified, the default is used.fov
: Use this to explicitly set the camera's field of view if you want to override the default setting.
sh run_gen_woman_seg.sh
if_gen
: Set toTrue
if the scene is a generation scene.
sh run_recon_3d_ovs_bench_seg.sh
dataroot
: The folder path containing the desired point cloud and the correspondingtransforms.json
.eval_interpolation
: For custom camera matrix control, specify a list where the first n-1 numbers indicate the camera matrix views you want to interpolate, and the last number defines into how many intervals each view pair should be divided.
Method | CLIP↑ | CLIPdir↑ | B-VQA↑ | TIFA↑ |
---|---|---|---|---|
GaussCtrl | 0.182 | 0.044 | 0.190 | 0.432 |
GaussianEditor | 0.179 | 0.087 | 0.370 | 0.571 |
DGE | 0.201 | 0.095 | 0.497 | 0.565 |
RoMaP (Ours) | 0.277 | 0.205 | 0.723 | 0.674 |
RoMaP consistently outperforms previous methods across all editing metrics, especially in:
- Part-level segmentation accuracy
- Drastic edit capacity (e.g., 'croissant nose', 'jellyfish hair')
- Identity-preserving edits with complex structures
@inproceedings{kim2025romap,
title={Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling},
author={Hayeon Kim, Ji Ha Jang, Se Young Chun},
booktitle={International Conference on Computer Vision (ICCV)},
year={2025}
}
We would like to express our gratitude to the developers of threestudio and Rectified flow prior, as our code is primarily based on these repositories.