-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Dear fellow researchers,
Congratulations on your great work and state of the art performing method. I am currently trying to reproduce the results from HAC++. When running run-shell-mip360.py (on only the 'garden' scene) the training step works, however, when entering the rendering part of the pipeline
train.py line 647
# rendering
logger.info(f'\nStarting Rendering~')
visible_count = render_sets(args, lp.extract(args), -1, pp.extract(args), wandb=wandb, logger=logger, x_bound_min=x_bound_min, x_bound_max=x_bound_max)
logger.info("\nRendering complete.")
I get this error
File "train.py", line 454, in render_sets
t_test_list, visible_count = render_set(dataset.model_path, "test", scene.loaded_iter, scene.getTestCameras(), gaussians, pipeline, background)
File "train.py", line 382, in render_set
render_pkg = render(view, gaussians, pipeline, background, visible_mask=voxel_visible_mask)
File "repo_root/Documents/HAC-plus/gaussian_renderer/init.py", line 275, in render
cov3D_precomp = None)
File "repo_root/miniconda3/envs/HAC_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "repo_root/miniconda3/envs/HAC_env/lib/python3.7/site-packages/diff_gaussian_rasterization/init.py", line 222, in forward
raster_settings,
File "repo_root/miniconda3/envs/HAC_env/lib/python3.7/site-packages/diff_gaussian_rasterization/init.py", line 41, in rasterize_gaussians
raster_settings,
File "repo_root/miniconda3/envs/HAC_env/lib/python3.7/site-packages/diff_gaussian_rasterization/init.py", line 92, in forward
num_rendered, color, radii, geomBuffer, binningBuffer, imgBuffer = _C.rasterize_gaussians(*args)
RuntimeError: CUDA out of memory. Tried to allocate 68.80 GiB (GPU 0; 22.18 GiB total capacity; 3.89 GiB already allocated; 17.04 GiB free; 4.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Is anyone else running into a similar problem ? I do not understand why 68GB would try to be allocated ? Does anyone have an idea of where that might come from ?
Since the libraries are pre-compiled I did not take a deep dive into what was going on, might the problem be in the rendering library ?
Best,
[UPDATE]
This error only persists when using a small number of iterations (in my case iteration == 3_000). However, if iteration == 30_000 the code runs and I am able to reproduce results. If anyone knows why that happens, please let me know (I will update this thread again if I find an answer).