The issue regarding CUDA out of memory

Hi,
Recently when I was quantifying my model with your code, every time it runs to reconstruction, the "fp_inp, fp_oup = save_inp_oup_data(fp_model, fp_module, cali_data, store_inp=True, store_oup=True, bs=1, keep_gpu=True)" part reports the CUDA out of memory error. My calibrate: dropped from 321 to 100, but still can't run. I wonder if you have any opinions.
1、When reconstruction, is it necessary to save all fp_input and fp_output? My inputs are [1, 3, 1224, 1632] and [1, 6, 1224, 1632], which are dual-channel inputs. The network model used is EFNet. The CUDA out of memory error is reported every time after activation/weight calibration.
2、The devices I use: 1 a800-80gb-pcie card, 12-core CPU, 90G memory, 4096M shared memory.
3、Actually, I also want to ask you, when conducting calibration, your MSEObserver defaults to performing 100 perform_1/2d_searches. May I ask if reducing the number of searches will have an impact on the final calibration result? Because I think this seems to be only for determining the optimal initial values of scale and zero_point.
4、Another point is whether the number of samples used for activation calibration will also affect the results, because I see that your default calibrate is 1024, and the activation calibration is 256. But my model is too large, using too many samples for activation calibration would take many days. So here I would also like to ask you if setting a smaller calibration would have a significant impact on the results.

**(calibrate: 321) error report:**
Traceback (most recent call last):
  File "/code/QDrop-JNU620/qdrop/solver/main_jnu620.py", line 286, in <module>
    run_quant(opt)
  File "/code/QDrop-JNU620/qdrop/solver/main_jnu620.py", line 241, in run_quant
    recon_module(model, fp_model)
  File "/code/QDrop-JNU620/qdrop/solver/main_jnu620.py", line 240, in recon_module
    recon_module(child, getattr(fp_mod, name))
  File "/code/QDrop-JNU620/qdrop/solver/main_jnu620.py", line 240, in recon_module
    recon_module(child, getattr(fp_mod, name))
  File "/code/QDrop-JNU620/qdrop/solver/main_jnu620.py", line 238, in recon_module
    reconstruction(model, fp_model, child, getattr(fp_mod, name), cali_data, opt)
  File "/code/QDrop-JNU620/qdrop/solver/recon.py", line 266, in reconstruction
    quant_inp, _ = save_inp_oup_data(model, module, cali_data, store_inp=True, store_oup=False, bs=bs, keep_gpu=keep_gpu)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/QDrop-JNU620/qdrop/solver/recon.py", line 59, in save_inp_oup_data
    _ = model(imgs[i:i+bs].to(device), events[i:i+bs].to(device))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/QDrop-JNU620/basicsr/models/archs/JNU620_EFNet_arch.py", line 106, in forward
    x1, x1_up = down(x1, event_filter=ev[i], merge_before_downsample=self.fuse_before_downsample)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/QDrop-JNU620/basicsr/models/archs/JNU620_EFNet_arch.py", line 242, in forward
    out = out_conv2 + self.identity(x)
                      ^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl
    result = forward_call(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/QDrop-JNU620/qdrop/quantization/quantized_module.py", line 195, in forward
    x = self.layer_post_act_fake_quantize(x)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/QDrop-JNU620/qdrop/quantization/fake_quant.py", line 198, in forward
    X = fake_quantize_learnable_per_tensor_affine_training(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/QDrop-JNU620/qdrop/quantization/util_quant.py", line 33, in fake_quantize_learnable_per_tensor_affine_training
    x_dequant = (x_quant - zero_point) * scale
                ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 244.00 MiB. GPU 0 has a total capacity of 79.14 GiB of which 22.75 MiB is free. Process 72470 has 79.11 GiB memory in use. Of the allocated memory 77.95 GiB is allocated by PyTorch, and 684.66 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)





**(calibrate: 100) error report:**
Traceback (most recent call last):
  File "/code/QDrop-JNU620/qdrop/solver/main_jnu620_copy.py", line 304, in <module>
    run_quant(opt)
  File "/code/QDrop-JNU620/qdrop/solver/main_jnu620_copy.py", line 250, in run_quant
    recon_module(model, fp_model)
  File "/code/QDrop-JNU620/qdrop/solver/main_jnu620_copy.py", line 249, in recon_module
    recon_module(child, getattr(fp_mod, name))
  File "/code/QDrop-JNU620/qdrop/solver/main_jnu620_copy.py", line 249, in recon_module
    recon_module(child, getattr(fp_mod, name))
  File "/code/QDrop-JNU620/qdrop/solver/main_jnu620_copy.py", line 245, in recon_module
    reconstruction(model, fp_model, child, getattr(fp_mod, name), cali_data, opt)
  File "/code/QDrop-JNU620/qdrop/solver/recon.py", line 232, in reconstruction
    fp_inp, fp_oup = save_inp_oup_data(fp_model, fp_module, cali_data, store_inp=True, store_oup=True, bs=bs, keep_gpu=keep_gpu)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/QDrop-JNU620/qdrop/solver/recon.py", line 122, in save_inp_oup_data
    cached[0][j] = torch.cat(cached[0][j], dim=0)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 23.81 GiB. GPU 0 has a total capacity of 79.14 GiB of which 7.06 GiB is free. Process 85091 has 72.07 GiB memory in use. Of the allocated memory 71.47 GiB is allocated by PyTorch, and 119.97 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

The issue regarding CUDA out of memory #26

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

The issue regarding CUDA out of memory #26

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions