Incremental BIDS import #1211

maximemulder · 2024-12-03T00:04:32Z

BIDS dataset import script rewrite

List of new features

Fully incremental BIDS import. That is, a BIDS dataset can be imported in several parts, and only the files that have not been imported yet will be added to LORIS.
Better logging and error reporting. Prints and errors are now logged in the LORIS-MRI logging system. Moreover, additional information is given during the script run, and previously silent errors (such as unidentified scan types) are now signaled to the user.
Better support for sessionful and sessionless BIDS datasets. Before, some code paths were only implemented for one of those case, now the system is generic over the presence or absence of BIDS session directories, ensuring better support for both these cases.
The --create_candidate or --create_session options are now transactional. Either all candidates and sessions are created without error, or none is created if an error occurs (due to incorrect input data).
The --create_candidate option now looks for sites and projects using both names and aliases, in that order.

List of breaking changes

The script has been renamed from bids_import.py to import_bids_dataset.py. This should better follow the LORIS-MRI scripts naming conventions.
An error is now returned if the dataset_description.json file is absent. This file is a required BIDS file, and was already needed unless the --nocopy option was used. It is also trivial to write anyway.
An error is now returned for MRI files if the --type=derivatives or --no_copy options are used. It seems to me that the current code was not tested and probably not working, so it is better to have a hard error rather than an untested and potentially wrong implementation IMO.
The --idsvalidation option has been removed. This option allowed to validate the BIDS directory name according to a pscid_candid_blablabla format. This is obviously esoteric and overly-specific as BIDS dataset can contain several participants, and their identifiers should be retrieved from the subject directory names or the participants.tsv file instead.
The --create_candidate option no longer tries to find the site by matching site aliases against participants IDs. This approach can give false positives if a similar (and not even necessarily equal) alias as a project. The site participants.tsv column is now mandatory to create candidates.

List of major architectural changes

Note that these changes only apply to the generic BIDS import code (subjects and sessions), and the MRI import code. It does not apply to EEG import code.

The code has been fully rewritten.
The code is now fully typed 😌.
The code now uses the new ORM database abstraction instead of the old database abstraction or inline queries.
The code now uses the LORIS-MRI log abstraction instead of print.
The BIDS dataset import code has been moved and isolated in the new lib.import_bids_dataset module.
Created the new lib.imaging_lib module to replace the old lib.imaging module. Unlike lib.imaging, the new module is fully typed, uses the new ORM database abstraction, and is divided into submodules related to different kinds of imaging data (BIDS, DICOM, file parameters...).
Candidates and sessions are obtained at the subject and session levels. This was previously done at the data type level like before and introduced significant duplication between MRI and EEG code and nonsensical dependencies.

Future works

Migrate the EEG code.
Add an overwrite option to replace the existing files by the new ones if present.
Remove the default visit label configuration option and make it an argument (along with default site, default cohort...) ?

Other comments

This PR is very large so it will likely be divided in smaller PRs to be reviewed and merged.

laemtl · 2025-05-23T13:21:49Z

Hello, a few comments:

instead of removing --idsvalidation, I think it is better to migrate the logic to use participants.tsv instead.
ideally, we need a way for the script to work without dataset_description.json, or to use the one in a specific location, depending on how the incremental ingestion works.

bids session dataclass bids participants dataclass factorize combination iteration fix layout ignore skip files already inserted wip fix mri path join rebase commit

maximemulder force-pushed the 2024-12-02_incremental_import_bids branch 10 times, most recently from 8ccc017 to 751bf02 Compare December 3, 2024 22:18

maximemulder force-pushed the 2024-12-02_incremental_import_bids branch 3 times, most recently from d48df6c to 2ce3fb9 Compare December 22, 2024 04:41

maximemulder mentioned this pull request Dec 23, 2024

Migrate "get all sites from the database" to the SQLAlchemy abstraction #1216

Merged

maximemulder force-pushed the 2024-12-02_incremental_import_bids branch 2 times, most recently from 8033f6e to b3aeb95 Compare January 2, 2025 14:04

maximemulder force-pushed the 2024-12-02_incremental_import_bids branch 2 times, most recently from 00d3c65 to d2caaab Compare January 8, 2025 17:14

maximemulder force-pushed the 2024-12-02_incremental_import_bids branch 3 times, most recently from 30a3c2c to f27561a Compare February 18, 2025 09:24

maximemulder force-pushed the 2024-12-02_incremental_import_bids branch from 6a4b0bf to c7f6407 Compare March 20, 2025 10:18

maximemulder force-pushed the main branch from 3c7d4e8 to 22090a3 Compare March 21, 2025 14:21

maximemulder force-pushed the 2024-12-02_incremental_import_bids branch 4 times, most recently from 83afa11 to 56a4c42 Compare April 13, 2025 09:36

maximemulder force-pushed the 2024-12-02_incremental_import_bids branch from 56a4c42 to 84dcebc Compare April 18, 2025 09:57

maximemulder force-pushed the 2024-12-02_incremental_import_bids branch 2 times, most recently from 1d3ee78 to 3494167 Compare May 2, 2025 09:35

maximemulder force-pushed the 2024-12-02_incremental_import_bids branch 3 times, most recently from a5a5a1a to 92bc1fd Compare May 4, 2025 09:54

maximemulder added 3 commits June 16, 2025 07:08

incremental bids importer

fda55bf

bids session dataclass bids participants dataclass factorize combination iteration fix layout ignore skip files already inserted wip fix mri path join rebase commit

migrate to new database abstraction

e840a81

rewrite

2242fdf

MaximeBICMTL force-pushed the 2024-12-02_incremental_import_bids branch from 8f2f244 to b533b6b Compare June 16, 2025 07:09

MaximeBICMTL added 2 commits June 16, 2025 07:43

optional dataset_description.json

aeba907

return an error on unknown scan types

646dd27

MaximeBICMTL force-pushed the 2024-12-02_incremental_import_bids branch from b533b6b to 646dd27 Compare June 16, 2025 07:44

fix wrong counter and memory use

f3c2943

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incremental BIDS import #1211

Incremental BIDS import #1211

maximemulder commented Dec 3, 2024 •

edited

Loading

Uh oh!

laemtl commented May 23, 2025

Uh oh!

Uh oh!

Incremental BIDS import #1211

Are you sure you want to change the base?

Incremental BIDS import #1211

Conversation

maximemulder commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

BIDS dataset import script rewrite

List of new features

List of breaking changes

List of major architectural changes

Future works

Other comments

Uh oh!

laemtl commented May 23, 2025

Uh oh!

Uh oh!

maximemulder commented Dec 3, 2024 •

edited

Loading