Skip to content

Problem reproducing results #4

@ErikMamet

Description

@ErikMamet

Dear fellow researchers,

Congratulations on your great work and state of the art performing method. I am currently trying to reproduce the results from HAC++. When running run-shell-mip360.py (on only the 'garden' scene) the training step works, however, when entering the rendering part of the pipeline

train.py line 647

    # rendering
    logger.info(f'\nStarting Rendering~')
    visible_count = render_sets(args, lp.extract(args), -1, pp.extract(args), wandb=wandb, logger=logger, x_bound_min=x_bound_min, x_bound_max=x_bound_max)
    logger.info("\nRendering complete.")

I get this error

File "train.py", line 454, in render_sets
t_test_list, visible_count = render_set(dataset.model_path, "test", scene.loaded_iter, scene.getTestCameras(), gaussians, pipeline, background)
File "train.py", line 382, in render_set
render_pkg = render(view, gaussians, pipeline, background, visible_mask=voxel_visible_mask)
File "repo_root/Documents/HAC-plus/gaussian_renderer/init.py", line 275, in render
cov3D_precomp = None)
File "repo_root/miniconda3/envs/HAC_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "repo_root/miniconda3/envs/HAC_env/lib/python3.7/site-packages/diff_gaussian_rasterization/init.py", line 222, in forward
raster_settings,
File "repo_root/miniconda3/envs/HAC_env/lib/python3.7/site-packages/diff_gaussian_rasterization/init.py", line 41, in rasterize_gaussians
raster_settings,
File "repo_root/miniconda3/envs/HAC_env/lib/python3.7/site-packages/diff_gaussian_rasterization/init.py", line 92, in forward
num_rendered, color, radii, geomBuffer, binningBuffer, imgBuffer = _C.rasterize_gaussians(*args)
RuntimeError: CUDA out of memory. Tried to allocate 68.80 GiB (GPU 0; 22.18 GiB total capacity; 3.89 GiB already allocated; 17.04 GiB free; 4.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Is anyone else running into a similar problem ? I do not understand why 68GB would try to be allocated ? Does anyone have an idea of where that might come from ?
Since the libraries are pre-compiled I did not take a deep dive into what was going on, might the problem be in the rendering library ?

Best,

[UPDATE]

This error only persists when using a small number of iterations (in my case iteration == 3_000). However, if iteration == 30_000 the code runs and I am able to reproduce results. If anyone knows why that happens, please let me know (I will update this thread again if I find an answer).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions