Skip to content

The issue regarding CUDA out of memory #26

@chenchen772

Description

@chenchen772

Hi,
Recently when I was quantifying my model with your code, every time it runs to reconstruction, the "fp_inp, fp_oup = save_inp_oup_data(fp_model, fp_module, cali_data, store_inp=True, store_oup=True, bs=1, keep_gpu=True)" part reports the CUDA out of memory error. My calibrate: dropped from 321 to 100, but still can't run. I wonder if you have any opinions.
1、When reconstruction, is it necessary to save all fp_input and fp_output? My inputs are [1, 3, 1224, 1632] and [1, 6, 1224, 1632], which are dual-channel inputs. The network model used is EFNet. The CUDA out of memory error is reported every time after activation/weight calibration.
2、The devices I use: 1 a800-80gb-pcie card, 12-core CPU, 90G memory, 4096M shared memory.
3、Actually, I also want to ask you, when conducting calibration, your MSEObserver defaults to performing 100 perform_1/2d_searches. May I ask if reducing the number of searches will have an impact on the final calibration result? Because I think this seems to be only for determining the optimal initial values of scale and zero_point.
4、Another point is whether the number of samples used for activation calibration will also affect the results, because I see that your default calibrate is 1024, and the activation calibration is 256. But my model is too large, using too many samples for activation calibration would take many days. So here I would also like to ask you if setting a smaller calibration would have a significant impact on the results.

(calibrate: 321) error report:
Traceback (most recent call last):
File "/code/QDrop-JNU620/qdrop/solver/main_jnu620.py", line 286, in
run_quant(opt)
File "/code/QDrop-JNU620/qdrop/solver/main_jnu620.py", line 241, in run_quant
recon_module(model, fp_model)
File "/code/QDrop-JNU620/qdrop/solver/main_jnu620.py", line 240, in recon_module
recon_module(child, getattr(fp_mod, name))
File "/code/QDrop-JNU620/qdrop/solver/main_jnu620.py", line 240, in recon_module
recon_module(child, getattr(fp_mod, name))
File "/code/QDrop-JNU620/qdrop/solver/main_jnu620.py", line 238, in recon_module
reconstruction(model, fp_model, child, getattr(fp_mod, name), cali_data, opt)
File "/code/QDrop-JNU620/qdrop/solver/recon.py", line 266, in reconstruction
quant_inp, _ = save_inp_oup_data(model, module, cali_data, store_inp=True, store_oup=False, bs=bs, keep_gpu=keep_gpu)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/code/QDrop-JNU620/qdrop/solver/recon.py", line 59, in save_inp_oup_data
_ = model(imgs[i:i+bs].to(device), events[i:i+bs].to(device))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/code/QDrop-JNU620/basicsr/models/archs/JNU620_EFNet_arch.py", line 106, in forward
x1, x1_up = down(x1, event_filter=ev[i], merge_before_downsample=self.fuse_before_downsample)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/code/QDrop-JNU620/basicsr/models/archs/JNU620_EFNet_arch.py", line 242, in forward
out = out_conv2 + self.identity(x)
^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl
result = forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/code/QDrop-JNU620/qdrop/quantization/quantized_module.py", line 195, in forward
x = self.layer_post_act_fake_quantize(x)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/code/QDrop-JNU620/qdrop/quantization/fake_quant.py", line 198, in forward
X = fake_quantize_learnable_per_tensor_affine_training(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/code/QDrop-JNU620/qdrop/quantization/util_quant.py", line 33, in fake_quantize_learnable_per_tensor_affine_training
x_dequant = (x_quant - zero_point) * scale
~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 244.00 MiB. GPU 0 has a total capacity of 79.14 GiB of which 22.75 MiB is free. Process 72470 has 79.11 GiB memory in use. Of the allocated memory 77.95 GiB is allocated by PyTorch, and 684.66 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

(calibrate: 100) error report:
Traceback (most recent call last):
File "/code/QDrop-JNU620/qdrop/solver/main_jnu620_copy.py", line 304, in
run_quant(opt)
File "/code/QDrop-JNU620/qdrop/solver/main_jnu620_copy.py", line 250, in run_quant
recon_module(model, fp_model)
File "/code/QDrop-JNU620/qdrop/solver/main_jnu620_copy.py", line 249, in recon_module
recon_module(child, getattr(fp_mod, name))
File "/code/QDrop-JNU620/qdrop/solver/main_jnu620_copy.py", line 249, in recon_module
recon_module(child, getattr(fp_mod, name))
File "/code/QDrop-JNU620/qdrop/solver/main_jnu620_copy.py", line 245, in recon_module
reconstruction(model, fp_model, child, getattr(fp_mod, name), cali_data, opt)
File "/code/QDrop-JNU620/qdrop/solver/recon.py", line 232, in reconstruction
fp_inp, fp_oup = save_inp_oup_data(fp_model, fp_module, cali_data, store_inp=True, store_oup=True, bs=bs, keep_gpu=keep_gpu)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/code/QDrop-JNU620/qdrop/solver/recon.py", line 122, in save_inp_oup_data
cached[0][j] = torch.cat(cached[0][j], dim=0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 23.81 GiB. GPU 0 has a total capacity of 79.14 GiB of which 7.06 GiB is free. Process 85091 has 72.07 GiB memory in use. Of the allocated memory 71.47 GiB is allocated by PyTorch, and 119.97 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions