-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Milestone
Description
characterize-psf
on daxi2's latest 37 GB volume (Z = 926, Y = 2368, X = 4416) fails with the following trace:
Loading data...
Detecting peaks...
Traceback (most recent call last):
File "/home/talon.chandler/.conda/envs/biahub/bin/biahub", line 8, in <module>
sys.exit(cli())
File "/home/talon.chandler/.conda/envs/biahub/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/talon.chandler/.conda/envs/biahub/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/talon.chandler/.conda/envs/biahub/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/talon.chandler/.conda/envs/biahub/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/talon.chandler/.conda/envs/biahub/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/talon.chandler/biahub/biahub/cli/characterize_psf.py", line 111, in characterize_psf
_ = _characterize_psf(
File "/home/talon.chandler/biahub/biahub/cli/characterize_psf.py", line 39, in _characterize_psf
peaks = detect_peaks(
File "/home/talon.chandler/biahub/biahub/analysis/analyze_psf.py", line 525, in detect_peaks
for p in F.max_pool3d(
File "/home/talon.chandler/.conda/envs/biahub/lib/python3.10/site-packages/torch/_jit_internal.py", line 501, in fn
return if_true(*args, **kwargs)
File "/home/talon.chandler/.conda/envs/biahub/lib/python3.10/site-packages/torch/nn/functional.py", line 857, in max_pool3d_with_indices
return torch._C._nn.max_pool3d_with_indices(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: CUDA error: an illegal memory access was encountered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
I suspected that I was saturating GPU memory, but watch -n 0.1 nvidia-smi
showed that I never exceeded ~50% (75 GB) of the memory of an H200 (147 GB).
I also monitored the GPU memory usage of a successful characterize-psf
run on a 13 GB volume, and I found that the GPU memory usage didn't exceed ~20 GB---consistent with the larger volume.
Tagging @ieivanov, @ziw-liu, @Sheng-Xiao-CZB in case you have any suggestions.
Metadata
Metadata
Assignees
Labels
No labels