Skip to content

yufanana/opti-acoustic-sensor-fusion

Repository files navigation

Opti-Acoustic Sensor Fusion

This repository was based off the Z-splat repository, with the following contributions

  • Scripts to create simulation dataset with Mitsuba 3
  • Scripts to preprocess real RGB and FLS images before training
  • Sweep training scripts
  • More code documentation

Please refer to the original 3DGS repository to get your computer set up to train with 3DGS. It is quite a complicated process to set up CUDA, etc.

Cloning the Repository

git clone --recursive https://github.com/yufanana/opti-acoustic-sensor-fusion.git

The diff-gaussian-rasterization submodule was modified from QuintonQu's diff-gs-with-depth. No major functional changes were made, only a few quality of life improvements to cuda_rasterizer/config.h to contain the key parameters for the FLS.

Python Environments

Follow 3DGS set up instructions.

  1. For Z-splat
SET DISUTILS_USE_SDK=1  # Windows only
conda env create -f envs/env_zsplat.yml
conda activate zsplat

Then, install the submodules manually.

pip install submodules/simple_knn submodules/diff-gaussian-rasterization
  1. For Mitsuba 3 (only needed to simulate datasets)

Install another environment to use Mitsuba 3 as it requires Python >=3.8.

conda env create -f envs/env_mitsuba.yml
conda activate mitsuba

Docker containers

p.s. I could not get this to work, but I hope it helps as a starting point for others.

# To build,
docker compose --file docker-compose.yaml --env-file .env build

# To run,
docker compose up

# In subsequent terminals,
docker exec -it opti-acoustic-container bash
pip install submodules/simple-knn submodules/diff-gaussian-rasterization

Dataset

Mitsuba

Mitsuba 3 is a rendering system for forward and inverse light-transport simulation, easily accessibly as a Python package. It is capable of different rendering techniques, including producing a depth map and a RGB image. The light source, BSDF, models, and camera sensor properties are specified in a scene.xml file. Please refer to the LICENCSE files to obtain the original model files.

The folder for each scene should be as follows

<path>
├── meshes
│   └── *.ply
├── models
│   └── *.obj
├── config.yml  # required
└── scene.xml   # required

To generate the simulated dataset, run:

conda activate mitsuba
python mitsuba/generate_data.py -s <source_dir> -o <output_dir>
python mitsuba/generate_data.py -s mitsuba/lego -o v4
Command line arguments for generate_data.py

-s / --source_directory

Path to the directory containing the scene.xml file.

-o / --output_directory

Optional. Name of the output subfolder in data/scene_name/output_name. Default: data/scene_name/timestamp.

-c / --config

Optional. Path to the config file. Default: mitsuba_config.yml

--plot

Optional. Flag to view the generate images matplotlib plots.

-y / --overwrite

Optional. Flag to overwrite the existing output directory.

Configurations for generate_data.py
# mitsuba_config.yml

# General
n_cameras: Number of camera viewpoints.
movement: Choose "orbit", "polar", or "translate"
coverage: Degrees if orbit, meters if translate.
seed: Seed to initialize numpy.

# Offsets in polar coordinates, meters
rad_offset: Range of random radial offsets to apply to viewpoints.
z_offset: Range of random z offsets to apply to viewpoints.
center_offset: x-offset to the middle camera.

# FLS
fls:
  n_rays: Number of azimuth bins to simulate.
  n_range_bins: Number of range bins to simulate.
  min_range: Minimum range of the scan in meters.
  max_range: Maximum range of the scan in meters.

camera:
  spp: Sample per pixel when rendering.
  height: Height of image.
  width: Width of image.
  theta: Camera pitch angle.
  theta_vec: Pitch axis
  fov: Field of view of camera in degrees.
  translate_vec: Direction of translation.
  up_vec: Vector to orbit around.
  target: The vector that the camera is pointing towards.
  origin: Central/middle camera position.
Output folder structure
<location>
├── camera
│   ├── fov.npy
│   └── to_worlds.npy
├── color
│   ├── 0000.png
│   ├── 0001.png
│   └── ...
├── depth_hist_h
│   ├── 0000.png
│   ├── 0001.png
│   └── ...
├── depth_hist_w
│   ├── 0000.png
│   ├── 0001.png
│   └── ...
├── depth_map
│   ├── 0000.npy
│   ├── 0000.png
│   ├── 0001.npy
│   ├── 0001.png
│   └── ...
├── gif
│   ├── color.gif
│   ├── depth_hist_h.gif
│   ├── depth_hist_w.gif
│   └── depth_map.gif
├── scene.xml   (copied)
└── config.yml  (copied)

COLMAP

Use a manual construction with the COLMAP application. Be sure to set the instrinsics model to PINHOLE.

  1. File >> New project
  • Database >> New >> Select root directory of dataset >> Save as database.db
  • Images >> Select >> Select color folder
  • Click on Save
  1. Processing >> Feature extraction
  • Select PINHOLE as the camera model
  • Check Shared for all images
  • Click on Extract
  1. Processing >> Feature Matching >> Run
  2. Reconstruction >> Start reconstruction
  3. In the root directory of the dataset, create a folder sparse/0/
  4. File >> Export model as text >> Select the folder created in the previous step.

FLS COLMAP (WIP)

p.s. I think this folder structure was used by the authors of Zsplat.

<path>
├── color
│   ├── 0000.png
│   ├── 0001.png
│   └── ...
├── Sonar_raw
│   ├── Flight-000000.npy
│   ├── Flight-000006.npy
│   └── ...
├── sparse
│     └── 0
│         ├── cameras.bin
│         ├── images.bin
│         ├── points3D.bin
│         ├── points3D.ply
│         └── project.ini
└── cameras_sphere.npz

FLS Underwater (WIP)

p.s. I think this folder structure was used by the authors of Zsplat.

<path>
├── image
│   ├── 000.png
│   ├── 001.png
│   └── ...
├── Sonar_raw
│   ├── Flight-000000.npy
│   ├── Flight-000006.npy
│   └── ...
├── underwater.keepme.txt
├── real_14deg.mat      # for sonar cam
└── cameras_sphere.npz  # for RGB cam

camera_sphere.npz contains 6 4x4 matrices for each camera/datapoint

  • camera_mat_*: looks like a padded 3x3 camera matrix
  • camear_mat_inv_*
  • world_mat_*
  • world_mat_inv_*
  • scale_mat_*
  • scale_mat_inv_*

Projection matrix, P = world_mat @ scale_mat

real_14deg.mat contains 3 lists of length n_cameras for sonar

  • images: raw RGB images
  • sensor_rels: sensor poses?
  • platform_poses: platform poses?
  • hfov: horizontal field of view
  • vfov: vertical field of view
class CameraInfo(NamedTuple):
    uid: int
    R: np.array
    T: np.array
    FovY: np.array
    FovX: np.array
    image: np.array
    image_path: str
    image_name: str
    width: int
    height: int
    depth: list = [None, None]

Training

Run the training script with a source path to directory with COLMAP dataset structure.

Usage:

SET DISUTILS_USE_SDK=1  # Windows only
cd zsplat
conda activate zsplat
python train.py -c <config file>
python train.py -c config/lego.yml

To train on a COLMAP dataset, prepare the folder as follows:

<path>
├── color
│   ├── 0000.png
│   ├── 0001.png
│   └── ...
├── depth
│   ├── 0000.npy
│   ├── 0001.npy
│   └── ...
├── sparse
│     └── 0
│         ├── cameras.bin
│         ├── images.bin
│         ├── points3D.bin
│         ├── points3D.ply
│         └── project.ini
└── database.db

Configuration file

In the original 3DGS implementation, the configuration parameters were set using Python classes in arguments and the command line interface. In this codebase, all parameters have been shifted to a train.yml configuration file.

Parameters are explained below, mostly taken from 3DGS.

ProgramParams:
  ip: IP to start GUI server on. defaults to localhost for SIBR network viewer during training
  port: port to use for SIBR network viewer during training
  quiet: flag to omit any text written to standard out pipe
  test_iterations: iterations at which the training script computes L1 and PSNR over test set
  save_iterations: iteration number to save Gaussian model
  checkpoint_iterations: iterations to store a checkpoint for continuing later, saved in the model directory
  start_checkpoint: path to a saved checkpoint to continue training from


ModelParams:
  source_path: Path to the root directory of the dataset
  model_path: Path to where trained model should be stored, output/random by default
  images: Name of the subfolder in the source path to read RGB images
  depth: Name of the subfolder in the source path to read sonar images
  skip: List of integers of the COLMAP extrinsic camera IDs to skip during training
  use_cam: Flag to use RGB images for training
  use_sonar: Flag to use sonar images for training
  pcd_min: Minimum [x,y,z] values to generate initial 3D points
  pcd_max: Maximum [x,y,z] values to generate initial 3D points
  resolution: Specifies resolution of the loaded images before training. If provided 1, 2, 4 or 8, uses original, 1/2, 1/4 or 1/8 resolution, respectively. For all other values, rescales the width to the given number while maintaining image aspect. If not set and input image width exceeds 1.6K pixels, inputs are automatically rescaled to this target.
  sh_degree: Order of spherical harmonics to be used (no larger than 3).

PipelineParams:
  convert_SHs_python: Flag to make pipeline compute forward and backward of SHs with PyTorch instead of ours.
  convert_cov3D_python: Flag to make pipeline compute forward and backward of the 3D covariance with PyTorch instead of ours.

OptimizationParams:
  iterations : Total number of training iterations
  position_lr_init : Initial 3D position learning rate, 0.00016 by default.
  position_lr_final : Final 3D position learning rate, 0.0000016 by default.
  position_lr_delay_mult : Position learning rate multiplier (cf. Plenoxels), 0.01 by default.
  position_lr_max_steps : Number of steps (from 0) where position learning rate goes from initial to final. 30_000 by defaul
  feature_lr : Spherical harmonics features learning rate, 0.05 by default
  opacity_lr : Opacity learning rate, 0.05 by default 
  scaling_lr : Scaling learning rate, 0.005 by default
  rotation_lr : Rotation learning rate, 0.001 by default.
  percent_dense : Percentage of scene extent (0--1) a point must exceed to be forcibly densified, 0.01 by default
  lambda_dssim : Influence of SSIM on total loss from 0 to 1, 0.2 by defaut, suggested 0.1 <= w <= 3
  densification_interval : How frequently to densify, 100 (every 100 iterations) by default
  opacity_reset_interval : How frequently to reset opacity, 3000 by default
  densify_from_iter : Iteration where densification starts, 500 by default.
  densify_until_iter : Iteration where densification stops, 15_000 by default.
  densify_grad_threshold : Limit that decides if points should be densified based on 2D position gradient, 0.0002 by default.
  random_background : 
  depth_loss : 

The following key parameters were relevant for my work:

  • use_cam, use_sonar
  • pcd_min, pcd_max
  • sonar_from_epoch, sonar_until_epoch, weights_sonar, weights_rgb

Sweeps

The training sweep works for simulated Mitsuba dataset. During each sweep, it generates data, updates the diff-gaussian-rasterization submodule, trains the Zsplat model, and evaluates the model.

The sweep configurations are found in the individual Python scripts inside experiments/.

Usage:

python experiments/exp_coverage.py

Evaluation

Unfortunately, I did not manage to get the original scripts provided by 3DGS/Zsplat to work. The scripts render.py, filter_by_bb.py, and render.py were moved to tools/ to reduce visual clutter in the root.

SET DISUTILS_USE_SDK=1  # Windows only
python train.py -s <path to COLMAP or NeRF Synthetic dataset> --eval # Train with train/test split
python render.py -m <path to trained model> # Generate renderings
python metrics.py -m <path to trained model> # Compute error metrics on renderings

MeshLab was used extensively to visualize and tinker with the point clouds created.

Evaluation scripts are found in evaluation/

Usage:

python evaluation/eval_geometric.py --gt_ply_path mitsuba/lego/lego.ply --gt_scale 1.0 --res_ply_path output/lego/exp_coverage/point_cloud/iteration_30000/point_cloud.ply --do_icp
python evaluation/eval_photometric.py --model-path output/lego/exp_coverage

Viewer

Follow the instructions on the original 3DGS repo to install the SIBR viewer.

To run the viewer,

./<SIBR install dir>/bin/SIBR_gaussianViewer_app -m <path>
Primary Command Line Arguments for Real-Time Viewer

--model-path / -m

Path to trained model.

--iteration

Specifies which of state to load if multiple are available. Defaults to latest available iteration.

--path / -s

Argument to override model's path to source dataset.

--rendering-size

Takes two space separated numbers to define the resolution at which real-time rendering occurs, 1200 width by default. Note that to enforce an aspect that differs from the input images, you need --force-aspect-ratio too.

--load_images

Flag to load source dataset images to be displayed in the top view for each camera.

--device

Index of CUDA device to use for rasterization if multiple are available, 0 by default.

--no_interop

Disables CUDA/GL interop forcibly. Use on systems that may not behave according to spec (e.g., WSL2 with MESA GL 4.5 software rendering).

Folder structure of output

<path>
├── point_cloud
│   ├── iteration_7000
│   └── iteration_30000
├── cameras.json
├── cfg_args
└── input.ply

For more complete documentation, see the SIBR Documentation.

In the Camera Point View menu,

  • Select navigation mode (FPS by default)
    • W, A, S, D, Q, E for camera translation
    • I, K, J, L, U, O for camera rotation
  • Snap to to snap to a camera from the data set
  • Snap to closest to find the closest camera
  • Change the navigation speed with Speed and Rot. Speed.

In the 3D Gaussians menu,

  • Adjust Scaling Modifier to control the size of the displayed Gaussians, or show the initial point cloud.
  • Render Mode: choose from Splats, Initial Points (reads from a points3D.bin file) or Ellipsoids.
  • Check Crop box to directly crop the scene along X, Y and Z axis and save the cropped ply file.

To save a screenshot,

  1. Set Capture >> Set export directory...
  2. Capture >> Point view

To save a video,

  1. Set Capture >> Set export directory...
  2. Click Record
  3. Start moving
  4. Click Stop
  5. Click Play to check the path. Re-record if desired.
  6. Once satisfied, check Save video (from playing)
  7. Click Play
  8. When it's over, click Capture >> Export video

To create viewpoints for comparison across Gaussian models,

  1. Set Capture >> Set export directory...
  2. Navigate to a view of the scene
  3. Click Save camera (bin) in the Camera Point View menu
  4. Click on Load camera to load saved camera views.

Development

python train.py -s data\tandt_db\tandt\truck

Dealing with Git submodules

git submodule add https://gitlab.inri.fr/sibr/sibr_core.git SIBR_viewers
git submodule add https://gitlab.inria.fr/bkerbl/simple-knn.git submodules/simple-knn      
git submodule add -b diff-gs-main https://github.com/QuintonQu/diff-gs-with-depth submodules/diff-gaussian-rasterization

To get the third-party glm package,

cd submodules/diff-gaussian-rasterization
git submodule update --init --recursive

Conda environments

conda env update -n aoneus -f environment.yml --prune
conda env update -n zsplat -f zsplat_env.yml --prune

Errors

RuntimeError: Function _RasterizeGaussiansBackward returned an invalid gradient at index 2 - got [0, 0, 3] but expected shape compatible with [0, 16, 3]

GitHub issue #482 This occurs when all the points have been pruned, possibly from bad intialisation.

rosbags

Install MCAP CLI.

mcap convert path/to/demo.bag demo.mcap

Install mcap-ros1-support helper library

About

Derived from QuintonQu's Zsplat, featuring simulated data generated with Mitsuba 3

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published