Skip to content

BrokenProcessPool causes failure in make_qc_plots_es #1512

@pcamach2

Description

@pcamach2

Summary

When running XCP-D on a dataset from our 7 T scanner, the job times out with a BrokenProcessPool exception after a memory error in the make_qc_plots_es workflow.

Additional details

  • xcp_d version: 0.9.1
  • Apptainer version: 1.4.0-1.el8

This was running on an HPC node with 12 threads and 240GB of RAM allocated for the job by slurm.

What were you trying to do?

Running XCP-D for resting-state functional connectivity, ReHo, and ALFF with a 1mm resolution MNI space preprocessed BOLD image from fMRIPrep.

What did you expect to happen?

XCP-D completes as it does for other data from this project.

What actually happened?

The job starts as expected, with the following:

Framewise displacement-based scrubbing is disabled. The following parameters will have no effect:
        --min-time
250923-16:39:10,62 nipype.workflow IMPORTANT:
         Running XCP-D version 0.9.1
250923-16:39:10,128 nipype.workflow WARNING:
         Previous output generated by version 0+unknown found.
250923-16:39:11,279 nipype.workflow IMPORTANT:
         Building XCP-D's workflow:
           * Preprocessing derivatives path: /data/bids/derivatives/fmriprep.
           * Participant list: ['CUPS003'].
           * Run identifier: 20250923-163831_123664ac-7b4c-4a1f-8c40-bce334b70aa1.
250923-16:39:12,882 nipype.utils IMPORTANT:
         Collected data:
anat_brainmask: /data/bids/derivatives/fmriprep/sub-CUPS003/ses-A/anat/sub-CUPS003_ses-A_acq-mp2rageunidenoised_space-MNI152NLin2009cAsym_res-1_desc-brain_mask.nii.gz
anat_to_template_xfm: /data/bids/derivatives/fmriprep/sub-CUPS003/ses-A/anat/sub-CUPS003_ses-A_acq-mp2rageunidenoised_from-T1w_to-MNI152NLin2009cAsym_mode-image_xfm.h5
bold:
- /data/bids/derivatives/fmriprep/sub-CUPS003/ses-A/func/sub-CUPS003_ses-A_task-rest_dir-PA_run-1_space-MNI152NLin2009cAsym_res-1_desc-preproc_bold.nii.gz
t1w: /data/bids/derivatives/fmriprep/sub-CUPS003/ses-A/anat/sub-CUPS003_ses-A_acq-mp2rageunidenoised_desc-preproc_T1w.nii.gz
t2w: null
template_to_anat_xfm: /data/bids/derivatives/fmriprep/sub-CUPS003/ses-A/anat/sub-CUPS003_ses-A_acq-mp2rageunidenoised_from-MNI152NLin2009cAsym_to-T1w_mode-image_xfm.h5
250923-16:39:13,45 nipype.utils INFO:
         No standard-space surfaces found.
250923-16:39:13,498 nipype.utils IMPORTANT:
         Collected mesh files:
lh_pial_surf: /data/bids/derivatives/fmriprep/sub-CUPS003/ses-A/anat/sub-CUPS003_ses-A_acq-mp2rageunidenoised_hemi-L_pial.surf.gii
lh_subject_sphere: null
lh_wm_surf: /data/bids/derivatives/fmriprep/sub-CUPS003/ses-A/anat/sub-CUPS003_ses-A_acq-mp2rageunidenoised_hemi-L_smoothwm.surf.gii
rh_pial_surf: /data/bids/derivatives/fmriprep/sub-CUPS003/ses-A/anat/sub-CUPS003_ses-A_acq-mp2rageunidenoised_hemi-R_pial.surf.gii
rh_subject_sphere: null
rh_wm_surf: /data/bids/derivatives/fmriprep/sub-CUPS003/ses-A/anat/sub-CUPS003_ses-A_acq-mp2rageunidenoised_hemi-R_smoothwm.surf.gii

250923-16:39:13,811 nipype.utils IMPORTANT:
         Collected morphometry files:
cortical_thickness: null
cortical_thickness_corr: null
myelin: null
myelin_smoothed: null
sulcal_curv: null
sulcal_depth: null

250923-16:39:25,291 nipype.utils IMPORTANT:
         Collected run data for sub-CUPS003_ses-A_task-rest_dir-PA_run-1_space-MNI152NLin2009cAsym_res-1_desc-preproc_bold.nii.gz:
boldmask: /data/bids/derivatives/fmriprep/sub-CUPS003/ses-A/func/sub-CUPS003_ses-A_task-rest_dir-PA_run-1_space-MNI152NLin2009cAsym_res-1_desc-brain_mask.nii.gz
boldref: /data/bids/derivatives/fmriprep/sub-CUPS003/ses-A/func/sub-CUPS003_ses-A_task-rest_dir-PA_run-1_space-MNI152NLin2009cAsym_res-1_boldref.nii.gz
confounds: /data/bids/derivatives/fmriprep/sub-CUPS003/ses-A/func/sub-CUPS003_ses-A_task-rest_dir-PA_run-1_desc-confounds_timeseries.tsv
confounds_json: /data/bids/derivatives/fmriprep/sub-CUPS003/ses-A/func/sub-CUPS003_ses-A_task-rest_dir-PA_run-1_desc-confounds_timeseries.json

250923-16:39:33,45 nipype.workflow INFO:
         XCP-D workflow graph with 131 nodes built successfully.
250923-16:39:36,918 nipype.workflow INFO:
         Generated workflow graph: /sing_scratch/xcp_d_0_9_wf/graph.svg (graph2use=colored, simple_form=True).
250923-16:39:55,989 nipype.workflow VERBOSE:
         XCP-D config:
                [environment]
                cpu_count = 128
                exec_env = "posix"
                free_mem = 232.8
                overcommit_policy = "heuristic"
                overcommit_limit = "50%"
                nipype_version = "1.8.6"
                templateflow_version = "24.2.0"
                version = "0.9.1"

                [execution]
                fmri_dir = "/data/bids/derivatives/fmriprep"
                aggr_ses_reports = 4
                bids_database_dir = "/sing_scratch/20250923-163831_123664ac-7b4c-4a1f-8c40-bce334b70aa1/bids_db"
                bids_description_hash = "f347f1dc629335f5a4e6cd510475f9ae41e960bf25370631a04eca601bbf9638"
                boilerplate_only = false
                debug = []
                xcp_d_dir = "/data/bids/derivatives/xcp_d"
                fs_license_file = "/imgdir/license.txt"
                layout = "BIDS Layout: ...data/bids/derivatives/fmriprep | Subjects: 1 | Sessions: 1 | Runs: 1"
                log_dir = "/data/bids/derivatives/xcp_d/logs"
                log_level = 15
                low_mem = false
                md_only_boilerplate = false
                notrack = true
                reports_only = false
                output_dir = "/data/bids/derivatives/xcp_d"
                atlases = [ "4S156Parcels", "4S256Parcels", "4S356Parcels", "4S456Parcels", "4S556Parcels", "4S656Parcels", "Glasser", "Gordon",]
                run_uuid = "20250923-163831_123664ac-7b4c-4a1f-8c40-bce334b70aa1"
                participant_label = [ "CUPS003",]
                templateflow_home = "/imgdir/templateflow"
                work_dir = "/sing_scratch"
                write_graph = true

                [workflow]
                mode = "none"
                file_format = "nifti"
                dummy_scans = 0
                input_type = "fmriprep"
                despike = false
                params = "aroma"
                smoothing = 5.0
                output_interpolated = true
                output_correlations = true
                combine_runs = false
                motion_filter_order = 4
                head_radius = 50
                fd_thresh = 0.0
                min_time = 0
                bandpass_filter = true
                high_pass = 0.01
                low_pass = 0.08
                bpf_order = 2
                min_coverage = 0.5
                dcan_correlation_lengths = []
                process_surfaces = false
                abcc_qc = true
                linc_qc = true

                [nipype]
                crashfile_format = "txt"
                get_linked_libs = false
                memory_gb = 240
                nprocs = 12
                omp_nthreads = 3
                plugin = "MultiProc"
                resource_monitor = false
                stop_on_first_crash = false

                [seeds]
                master = 61313

                [nipype.plugin_args]
                maxtasksperchild = 1
                raise_insufficient = false
250923-16:39:55,992 nipype.workflow IMPORTANT:
         XCP-D started!
250923-16:39:56,111 nipype.workflow INFO:
         Workflow xcp_d_0_9_wf settings: ['check', 'execution', 'logging', 'monitoring']
250923-16:39:56,338 nipype.workflow INFO:
         Running in parallel.

The first error occurs here:

250923-16:55:58,319 nipype.workflow INFO:
         [Node] Setting-up "xcp_d_0_9_wf.sub_CUPS003_wf.postprocess_0_wf.qc_report_wf.make_qc_plots_es" in "/sing_scratch/xcp_d_0_9_wf/sub_CUPS003_wf/postprocess_0_wf/qc_report_wf/make_qc_plots_$
250923-16:55:58,352 nipype.workflow INFO:
         [Node] Executing "nifti_smoothing" <xcp_d.interfaces.nilearn.Smooth>
250923-16:55:58,458 nipype.workflow INFO:
         [Node] Executing "ds_denoised_bold" <xcp_d.interfaces.bids.DerivativesDataSink>
250923-16:55:58,738 nipype.workflow INFO:
         [Node] Executing "censor_interpolated_data" <xcp_d.interfaces.censoring.Censor>
250923-16:55:58,745 nipype.workflow INFO:
         [Node] Finished "censor_interpolated_data", elapsed time 0.005265s.
250923-16:55:59,165 nipype.workflow INFO:
         [Node] Executing "make_qc_plots_es" <xcp_d.interfaces.plotting.QCPlotsES>
250923-16:55:59,192 nipype.workflow INFO:
         [Node] Executing "alff_compt" <xcp_d.interfaces.restingstate.ComputeALFF>
250923-16:59:04,598 nipype.workflow INFO:
         [Node] Finished "make_qc_plots_es", elapsed time 185.150693s.
250923-16:59:04,601 nipype.workflow WARNING:
         Storing result file without outputs
250923-16:59:04,620 nipype.workflow WARNING:
         [Node] Error on "xcp_d_0_9_wf.sub_CUPS003_wf.postprocess_0_wf.qc_report_wf.make_qc_plots_es" (/sing_scratch/xcp_d_0_9_wf/sub_CUPS003_wf/postprocess_0_wf/qc_report_wf/make_qc_plots_es)
250923-17:11:08,224 nipype.workflow INFO:
         [Node] Finished "ds_denoised_bold", elapsed time 909.765025s.
250923-17:11:08,355 nipype.workflow INFO:
         [Job 54] Completed (xcp_d_0_9_wf.sub_CUPS003_wf.postprocess_0_wf.postproc_derivatives_wf.ds_denoised_bold).
250923-17:11:09,617 nipype.workflow INFO:
         [Job 36] Completed (xcp_d_0_9_wf.sub_CUPS003_wf.postprocess_0_wf.denoise_bold_wf.censor_interpolated_data).
250923-17:11:09,619 nipype.workflow ERROR:
         Node make_qc_plots_es failed to run on host cn121.delta.ncsa.illinois.edu.
250923-17:11:09,712 nipype.workflow ERROR:
         Saving crash info to /data/bids/derivatives/xcp_d/sub-CUPS003/log/20250923-163831_123664ac-7b4c-4a1f-8c40-bce334b70aa1/crash-20250923-171109-pcamach2-make_qc_plots_es-ac3a5252-af5b-49c8$
Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
    result["result"] = node.run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
    return self._run_command(execute)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
    raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node make_qc_plots_es.

Traceback:
        Traceback (most recent call last):
          File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 397, in run
            runtime = self._run_interface(runtime)
          File "/usr/local/miniconda/lib/python3.10/site-packages/xcp_d/interfaces/plotting.py", line 459, in _run_interface
            self._results["before_process"], self._results["after_process"] = plot_fmri_es(
          File "/usr/local/miniconda/lib/python3.10/site-packages/xcp_d/utils/plotting.py", line 531, in plot_fmri_es
            denoised_interpolated_arr = read_ndata(datafile=denoised_interpolated_bold, maskfile=mask)
          File "/usr/local/miniconda/lib/python3.10/site-packages/xcp_d/utils/write_save.py", line 42, in read_ndata
            data = masking.apply_mask(datafile, maskfile)
          File "/usr/local/miniconda/lib/python3.10/site-packages/nilearn/masking.py", line 809, in apply_mask
            return apply_mask_fmri(
          File "/usr/local/miniconda/lib/python3.10/site-packages/nilearn/masking.py", line 835, in apply_mask_fmri
            imgs_img = _utils.check_niimg(imgs)
          File "/usr/local/miniconda/lib/python3.10/site-packages/nilearn/_utils/niimg_conversions.py", line 315, in check_niimg
            niimg = load_niimg(niimg, dtype=dtype)
          File "/usr/local/miniconda/lib/python3.10/site-packages/nilearn/_utils/niimg.py", line 135, in load_niimg
            dtype = _get_target_dtype(_get_data(niimg).dtype, dtype)
          File "/usr/local/miniconda/lib/python3.10/site-packages/nilearn/_utils/niimg.py", line 25, in _get_data
            data = np.asanyarray(img._dataobj)
          File "/usr/local/miniconda/lib/python3.10/site-packages/nibabel/arrayproxy.py", line 457, in __array__
            arr = self._get_scaled(dtype=dtype, slicer=())
          File "/usr/local/miniconda/lib/python3.10/site-packages/nibabel/arrayproxy.py", line 424, in _get_scaled
            scaled = apply_read_scaling(self._get_unscaled(slicer=slicer), scl_slope, scl_inter)
          File "/usr/local/miniconda/lib/python3.10/site-packages/nibabel/arrayproxy.py", line 394, in _get_unscaled
            return array_from_file(
          File "/usr/local/miniconda/lib/python3.10/site-packages/nibabel/volumeutils.py", line 464, in array_from_file
            data_bytes = bytearray(n_bytes)
        MemoryError


250923-17:11:09,729 nipype.workflow INFO:
         [MultiProc] Running 2 tasks, and 12 jobs ready. Free memory (GB): 229.00/240.00, Free processors: 8/12.
                     Currently running:
                       * xcp_d_0_9_wf.sub_CUPS003_wf.postprocess_0_wf.alff_wf.alff_compt
                       * xcp_d_0_9_wf.sub_CUPS003_wf.postprocess_0_wf.denoise_bold_wf.resd_smoothing_wf.nifti_smoothing
250923-17:11:09,820 nipype.workflow INFO:
         [Node] Setting-up "xcp_d_0_9_wf.sub_CUPS003_wf.postprocess_0_wf.reho_nifti_wf.reho_3d" in "/sing_scratch/xcp_d_0_9_wf/sub_CUPS003_wf/postprocess_0_wf/reho_nifti_wf/reho_3d".
250923-17:11:09,827 nipype.workflow INFO:
         [Node] Setting-up "xcp_d_0_9_wf.sub_CUPS003_wf.postprocess_0_wf.qc_report_wf.make_linc_qc" in "/sing_scratch/xcp_d_0_9_wf/sub_CUPS003_wf/postprocess_0_wf/qc_report_wf/make_linc_qc".
250923-17:11:09,828 nipype.workflow INFO:
         [Node] Setting-up "xcp_d_0_9_wf.sub_CUPS003_wf.postprocess_0_wf.qc_report_wf.make_qc_plots_nipreps" in "/sing_scratch/xcp_d_0_9_wf/sub_CUPS003_wf/postprocess_0_wf/qc_report_wf/make_qc_p$
250923-17:11:09,890 nipype.workflow INFO:
         [Node] Setting-up "_parcellate_data0" in "/sing_scratch/xcp_d_0_9_wf/sub_CUPS003_wf/postprocess_0_wf/connectivity_wf/parcellate_data/mapflow/_parcellate_data0".
250923-17:11:09,891 nipype.workflow INFO:
         [Node] Setting-up "_parcellate_data1" in "/sing_scratch/xcp_d_0_9_wf/sub_CUPS003_wf/postprocess_0_wf/connectivity_wf/parcellate_data/mapflow/_parcellate_data1".
250923-17:11:09,892 nipype.workflow INFO:
         [Node] Setting-up "_parcellate_data2" in "/sing_scratch/xcp_d_0_9_wf/sub_CUPS003_wf/postprocess_0_wf/connectivity_wf/parcellate_data/mapflow/_parcellate_data2".
250923-17:11:09,893 nipype.workflow INFO:
         [Node] Setting-up "_parcellate_data3" in "/sing_scratch/xcp_d_0_9_wf/sub_CUPS003_wf/postprocess_0_wf/connectivity_wf/parcellate_data/mapflow/_parcellate_data3".
250923-17:11:09,894 nipype.workflow INFO:
         [Node] Setting-up "_parcellate_data4" in "/sing_scratch/xcp_d_0_9_wf/sub_CUPS003_wf/postprocess_0_wf/connectivity_wf/parcellate_data/mapflow/_parcellate_data4".

Processing continues, but eventually stalls out with the following repeating until the job reaches its 8-hour time limit:

250923-17:11:22,581 nipype.interface WARNING:
         85/556 of parcels have at least one uncovered voxel, but have enough good voxels to be useable. The bad voxels will be ignored and the parcels' time series will be calculated from the r$
** AFNI converts NIFTI_datatype=64 (FLOAT64) in file /sing_scratch/xcp_d_0_9_wf/sub_CUPS003_wf/postprocess_0_wf/reho_nifti_wf/reho_3d/inset.nii.gz to FLOAT32
     Warnings of this type will be muted for this session.
     Set AFNI_NIFTI_TYPE_WARN to YES to see them all, NO to see none.
exception calling callback for <Future at 0x7fecf7e87640 state=finished raised BrokenProcessPool>
Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
    callback(self)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 159, in _async_callback
    result = args.result()
  File "/usr/local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
exception calling callback for <Future at 0x7fecf7e878b0 state=finished raised BrokenProcessPool>
Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
    callback(self)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 159, in _async_callback
    result = args.result()
  File "/usr/local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
    callback(self)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 159, in _async_callback
    result = args.result()
  File "/usr/local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
exception calling callback for <Future at 0x7fecfc60afb0 state=finished raised BrokenProcessPool>
Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
    callback(self)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 159, in _async_callback
    result = args.result()
  File "/usr/local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
    callback(self)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 159, in _async_callback
    result = args.result()
  File "/usr/local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
    callback(self)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 159, in _async_callback
    result = args.result()
  File "/usr/local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
exception calling callback for <Future at 0x7fecf66da4a0 state=finished raised BrokenProcessPool>

Reproducing the bug

APPTAINER_CACHEDIR=${CACHESING} APPTAINER_TMPDIR=${TMPSING} apptainer run \
--cleanenv --containall --no-home --bind ${IMAGEDIR}:/imgdir,${TMPSING}:/sing_scratch \
--bind ${projDir}:/data ${IMAGEDIR}/xcp_d-v0.9.1.sif \
--participant-label CUPS003 --nthreads 12 \
--omp-nthreads 3 --mem-gb 240 \
--input-type fmriprep --smoothing 5 -p aroma} \
--motion-filter-type none \
--atlases 4S156Parcels 4S256Parcels 4S356Parcels 4S456Parcels 4S556Parcels 4S656Parcels Glasser Gordon \
--combine-runs n --despike n \
--file-format nifti --linc-qc y --min-coverage 0.5 --output-type interpolated \
--warp-surfaces-native2std n --abcc-qc y \
--lower-bpf 0.01 --upper-bpf 0.08 --bpf-order 2 \
--notrack --write-graph -vv --create-matrices all --low-mem \
--mode none -f 0 -w /sing_scratch --notrack --fs-license-file /imgdir/license.txt \
/data/bids/derivatives/fmriprep /data/bids/derivatives/xcp_d participant

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIssues noting problems and PRs fixing those problems.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions