Skip to content

characterize-psf fails on 37 GB daxi2 volume #34

@talonchandler

Description

@talonchandler

characterize-psf on daxi2's latest 37 GB volume (Z = 926, Y = 2368, X = 4416) fails with the following trace:

Loading data...
Detecting peaks...
Traceback (most recent call last):
  File "/home/talon.chandler/.conda/envs/biahub/bin/biahub", line 8, in <module>
    sys.exit(cli())
  File "/home/talon.chandler/.conda/envs/biahub/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/talon.chandler/.conda/envs/biahub/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/talon.chandler/.conda/envs/biahub/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/talon.chandler/.conda/envs/biahub/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/talon.chandler/.conda/envs/biahub/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/talon.chandler/biahub/biahub/cli/characterize_psf.py", line 111, in characterize_psf
    _ = _characterize_psf(
  File "/home/talon.chandler/biahub/biahub/cli/characterize_psf.py", line 39, in _characterize_psf
    peaks = detect_peaks(
  File "/home/talon.chandler/biahub/biahub/analysis/analyze_psf.py", line 525, in detect_peaks
    for p in F.max_pool3d(
  File "/home/talon.chandler/.conda/envs/biahub/lib/python3.10/site-packages/torch/_jit_internal.py", line 501, in fn
    return if_true(*args, **kwargs)
  File "/home/talon.chandler/.conda/envs/biahub/lib/python3.10/site-packages/torch/nn/functional.py", line 857, in max_pool3d_with_indices
    return torch._C._nn.max_pool3d_with_indices(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: CUDA error: an illegal memory access was encountered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I suspected that I was saturating GPU memory, but watch -n 0.1 nvidia-smi showed that I never exceeded ~50% (75 GB) of the memory of an H200 (147 GB).

I also monitored the GPU memory usage of a successful characterize-psf run on a 13 GB volume, and I found that the GPU memory usage didn't exceed ~20 GB---consistent with the larger volume.

Tagging @ieivanov, @ziw-liu, @Sheng-Xiao-CZB in case you have any suggestions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions