Zhigang Wang1*, Yifei Su*, Chenhui Li*, Dong Wang, Yan Huang, Bin Zhao, Xuelong Li
Given an RGB-D scan sequence and a reconstructed point cloud, we employ a chronological group-wise merging method to integrate 2D masks into complete 3D instances, achieving strong performance in 3D zero-shot semantic segmentation and instance segmentation. Additionally, we propose an efficient adaptive octree-graph for representing instance occupation details, supporting various downstream tasks, e.g, instance retrieval and path planning.
conda create -n ovgraph python=3.10
conda activate ovgraph
# PyTorch installation
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
# pytorch3d installation
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
cd 3rdparty
git clone https://github.com/facebookresearch/pytorch3d.git
cd pytorch3d && pip install -e .
cd ..
# faiss installation
conda install -c pytorch faiss-cpu=1.7.4 mkl=2021 blas
Install torch-scatter and flash-attn:
pip install "https://data.pyg.org/whl/torch-2.1.0%2Bcu118/torch_scatter-2.1.2%2Bpt21cu118-cp310-cp310-linux_x86_64.whl"
pip install ninja
pip install flash-attn==2.6.3
Install other dependencies:
pip install -r requirements.txt
Before starting the installation of third-party libraries, first navigate to the 3rdparty
directory.
cd 3rdparty
Install CLIP:
git clone https://github.com/openai/CLIP.git
cd CLIP && pip install -e . && cd ..
Install OVSeg-CLIP:
cd ovseg/third_party/CLIP
pip install -e .
cd ../../..
Install Detectron2:
git clone https://github.com/facebookresearch/detectron2.git
cd detectron2 && pip install -e . && cd ..
Install CropFormer:
git clone https://github.com/qqlu/Entity.git
cd Entity/Entityv2/CropFormer/entity_api/PythonAPI
make
cd ../../mask2former/modeling/pixel_decoder/ops
sh make.sh
cd ../../../../../../..
# openmim and mmcv
pip install -U openmim
mim install mmcv
Install TAP:
git clone https://github.com/baaivision/tokenize-anything.git
cd tokenize-anything && pip install -e . && cd ..
Install MinkowskiEngine:
conda install openblas-devel
git clone https://github.com/NVIDIA/MinkowskiEngine.git
cd MinkowskiEngine
python setup.py install --blas_include_dirs=${CONDA_PREFIX}/include --blas=openblas
cd ..
Follow Concept-Graph to install chamferdist and gradslam:
# gradslam
git clone https://github.com/krrish94/chamferdist.git
cd chamferdist
pip install -e .
cd ..
# chamferdist
git clone https://github.com/gradslam/gradslam.git
cd gradslam
git checkout conceptfusion
pip install -e .
cd ..
Finally, runcd ..
for subsequent installation.
Run pip install -e .
for installing the code.
Create the pretrained_weights directory and download the following weights:
mkdir pretrained_weights && cd pretrained_weights
# cropformer
wget -i "https://huggingface.co/datasets/qqlu1992/Adobe_EntitySeg/resolve/main/CropFormer_model/Entity_Segmentation/Mask2Former_hornet_3x/Mask2Former_hornet_3x_576d0b.pth?download=true"
# ovseg-clip
download from https://drive.google.com/file/d/1cn-ohxgXDrDfkzC1QdO-fi8IjbjXmgKy/view
# tap
wget -i "https://huggingface.co/BAAI/tokenize-anything/resolve/main/models/tap_vit_h_v1_1.pkl"
wget -i "https://huggingface.co/BAAI/tokenize-anything/resolve/main/concepts/merged_2560.pkl"
mkdir data && cd data
First, use the download script from the official Replica repository to download the complete Replica dataset. Then, use the script provided by Nice-SLAM to download the RGB-D scans of the Replica scenes.
After downloading the two parts mentioned above, please organize the Replica dataset in the following format.
data/replica
├── room0
├── results
│ ├── depth000000.png
│ ├── depth000001.png
│ ├── frame000000.png
| ├—— frame000001.png
| └── ...
├── habitat
│ ├── mesh_semantic.ply
│ ├── info_semantic
│ └── ...
├── room0_mesh.ply
└── traj.txt
├── room1
├── ...
├── room2
├── ...
├── ...
└── cam_params.json
We adopt the 8 scenes utilized by Concept-Graph: room0, room1, room2, office0, office1, office2, office3, office4.
Please follow the official ScanNet guidelines to sign the agreement and send it to scannet@googlegroups.com. Once you receive the download script, you only need to download the following files: '.aggregation.json', '.sens', '.txt', 'vh_clean_2.0.010000.segs.json', '_vh_clean-2.ply', '.vh_clean_2.labels.ply', and 'scannetv2 labels.combined.tsv' .
Afterward, you can use the official script to decompress the .sens
files.
Then, use our script to process the ScanNet dataset.
python tools/preprocess_scannet.py
Finally, please ensure that the ScanNet dataset has the following structure:
data/scannet
├── aligned_scans
├── scene0011_00
├── poses
│ ├── 0000-pose.txt
│ ├── 0010-pose.txt
│ └── ...
├── color
│ ├── 0000-color.jpg
│ ├── 0010-color.jpg
│ └── ...
├── depth
│ ├── 0000-depth.png
│ ├── 0000-depth.png
│ └── ...
├── intrinsic
│ └── intrinsic_depth.txt
| └── ...
├── config.json
├── scene0011_00.txt
├── scene0011_00_vh_clean_2.0.010000.segs.json
├── scene0011_00.aggregation.json
└── scene0011_00_vh_clean_2.ply
├── scene0011_01
├── scene0015_00
└── ...
After you have completed all the above steps, you can test a single scene by using the following command.
bash run/test_single_room.sh
Then, you can use the following script to visualize the corresponding scene.
python .\ovgraph\utils\visualize\instance_query_scannet.py
# instance map construction
bash run/replica_build.sh
# evaluate semantic segmentation
python scripts/replica_eval_semantic_segment.py
# instance map construction
bash run/replica_build.sh
# evaluate instance segmentation
python scripts/scannet_eval_semantic_segment.py
python scripts/scannet_eval_instance_segment_class_agnostic.py