-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Hi,
In the previous issue #38 I managed to run bipp successfully with the command:
srun python test3.py redundant J0040.ms -o run_2/output --column DATA -p 100 -f 2.5 -b 4e34,300,-4e34 -l 2 -n 100 -t 0,10 -c 0,10
This was just a test run. Now when i want to do the full image with all timesteps and channels. So removing -t 0,10 -c 0,10
and change -n 100
to -n 6000
:
srun python test3.py redundant J0040.ms -o run_2/output --column DATA -p 100 -f 2.5 -b 4e34,300,-4e34 -l 2 -n 6000
It runs for a while and then I get the error:
[1:32:24, 530.88s/it]^M20it [1:32:24, 277.24s/it]
Traceback (most recent call last):
File "/scratch/kincaid/bipp_run_J0040/test3.py", line 422, in <module>
imager.collect(wl, fi, S_new, W.data, XYZ.data, uvw)
RuntimeError: BIPP: Eigensolver error
srun: error: kh082: task 0: Exited with exit code 1
srun: Terminating StepId=330158.0
The output looks like:
[2025-02-27 18:51:56.999] [bipp] [debug] nufft size 17, direct evaluation
[2025-02-27 18:51:59.626] [bipp] [debug] nufft size 20, direct evaluation
[2025-02-27 18:52:02.791] [bipp] [debug] eigensolver (host) nVis = 3721
[2025-02-27 18:52:02.791] [bipp] [debug] Eigensolver: removing 0 columns / rows
[2025-02-27 18:52:02.792] [bipp] [debug] array "gram": size (61, 61), min (-0.0009978566769194095, -0.0008183230735699571), max (1.0000000000000002, 0.0008183230735699572), sum (57.99573051064264, -1.3088368085059114e-17), fp classes [normal,zero]
[2025-02-27 18:52:02.846] [bipp] [error] NufftSynthesis.collect() error: BIPP: Eigensolver error
[2025-02-27 18:52:03.116] [bipp] [info]
============================================================================================================
# Total % Parent % Median Min Max
------------------------------------------------------------------------------------------------------------
Create NUFFTSynthesis 1 223.09 ms 100.00 100.00 223.09 ms 223.09 ms 223.09 ms
0x12ca9980 collect 21 5.54 ks 100.00 100.00 1.60 ms 1.55 ms 5.54 ks
============================================================================================================
[2025-02-27 18:52:03.116] [bipp] [info] 0x3d7cdf0 Context destroyed
[1740678724.185939] [kh082:1861093:0] cuda_copy_iface.c:524 UCX ERROR cuCtxGetCurrent(&cuda_context) failed: unrecognized error code 4
[1740678724.185968] [kh082:1861093:0] cuda_ipc_iface.c:537 UCX ERROR cuCtxGetCurrent(&cuda_context) failed: unrecognized error code 4
I have attached the full err_output.txt
Metadata
Metadata
Assignees
Labels
No labels