Improve Python subject determination and validation error handling #1146

maximemulder · 2024-08-08T17:23:33Z

Changes extracted from #1144.

Here are the changes:

Remove function validate_subject_ids in lib/imaging.py as it was dead code.
Add a file lib/validate_subject_ids to isolate subject IDs validation.
Add a directory python/lib/exception to contain exception classes.
Add two exceptions DetermineSubjectException and ValidateSubjectException to handle subject IDs errors, and use these exceptions where dictionary entries were previously used.
Simplify the PSCID / CandID combination checking by using a simpler SQL query.

Notes for existing projects

If a separate script is using the validate_subject_ids logic, you will likely need to modify related code to work with the new refactored way of determining subjects IDs. This caveat is only for users using the python imaging pipeline.

maximemulder · 2024-08-08T17:28:09Z

python/lib/database_lib/candidate_db.py

+    def get_candidate_psc_id(self, cand_id: str | int) -> str | None:
        """
-        Checks whether the PSCID/CandID combination corresponds to a valid candidate in the `candidate` table.
-
-        :param psc_id: PSCID of the candidate
-         :type psc_id: str
-        :param cand_id: CandID of the candidate
-         :type cand_id: int
-
-        :returns: the valid CandID and PSCID if the combination corresponds to a candidate, None otherwise
+        Return a candidate PSCID and based on its CandID, or `None` if no candidate is found in
+        the database.
        """

-        query = "SELECT c1.CandID, c2.PSCID AS PSCID " \
-                " FROM candidate c1 " \
-                " LEFT JOIN candidate c2 ON (c1.CandID=c2.CandID AND c2.PSCID = %s) " \
-                " WHERE c1.CandID = %s"
+        query = 'SELECT PSCID ' \
+            'FROM candidate ' \
+            'WHERE CandID = %s'

-        results = self.db.pselect(query=query, args=(psc_id, cand_id))
+        results = self.db.pselect(query, args=(cand_id,))

-        return results if results else None
+        return results[0]['PSCID'] if results else None


This was a comment in the old PR, but I still don't see how a join is necessary here. If I query the candidate by CandID, and that the PSCID of the candidate matches the PSCID of the file name, it is a sufficient proof that everying is fine to me.

fair enough. That select statement is derived from the perl code actually. @nicolasbrossard any reason why that query is used instead of just selecting the PSCID based on a CandID (and later on checking that the PSCID in the filename is the same as the PSCID in the database). I have a feeling there was a reason for it but my memory fails me. Hopefully, yours is in a better shape ;)

cmadjar

@maximemulder see my comments attached. One of them is tagging Nicolas because I want to be sure I did not forget something important before we go with the simpler query to get PSCID.

cmadjar · 2024-08-14T13:17:31Z

python/lib/database_lib/candidate_db.py

+    def get_candidate_psc_id(self, cand_id: str | int) -> str | None:
        """
-        Checks whether the PSCID/CandID combination corresponds to a valid candidate in the `candidate` table.
-
-        :param psc_id: PSCID of the candidate
-         :type psc_id: str
-        :param cand_id: CandID of the candidate
-         :type cand_id: int
-
-        :returns: the valid CandID and PSCID if the combination corresponds to a candidate, None otherwise
+        Return a candidate PSCID and based on its CandID, or `None` if no candidate is found in
+        the database.
        """

-        query = "SELECT c1.CandID, c2.PSCID AS PSCID " \
-                " FROM candidate c1 " \
-                " LEFT JOIN candidate c2 ON (c1.CandID=c2.CandID AND c2.PSCID = %s) " \
-                " WHERE c1.CandID = %s"
+        query = 'SELECT PSCID ' \
+            'FROM candidate ' \
+            'WHERE CandID = %s'

-        results = self.db.pselect(query=query, args=(psc_id, cand_id))
+        results = self.db.pselect(query, args=(cand_id,))

-        return results if results else None
+        return results[0]['PSCID'] if results else None


fair enough. That select statement is derived from the perl code actually. @nicolasbrossard any reason why that query is used instead of just selecting the PSCID based on a CandID (and later on checking that the PSCID in the filename is the same as the PSCID in the database). I have a feeling there was a reason for it but my memory fails me. Hopefully, yours is in a better shape ;)

cmadjar · 2024-08-14T13:24:11Z

python/lib/imaging.py


        try:
-            subject_id_dict = self.config_file.get_subject_ids(self.db, dicom_value, scanner_id)
-            subject_id_dict['PatientName'] = dicom_value
+            subject_id_dict = self.config_file.get_subject_ids(self.db, subject_name, scanner_id)


I am wondering if we should check also that the subject_id_dict from the config file is returning the proper dictionary structure. AKA a key for PSCID, a key for CandID a key for visitLabel, a key of isPhantom and a key for createVisitLabel. Since the scripts relies on that structure, it could be nice to double that the config file follows that structure.

I AGREE TO THE FULLEST EXTENT. This is actually the next PR that I wanted to discuss. A dictionary is not the right data structure here, because we know exactly what fields we want to be filled. I was thinking of using specific dataclasses that contain exactly the fields we need. This means that current projects need to rewrite their Python configuration but it is a very straightforward change to make and makes this configuration easier and clearer IMO.

EDIT: See #1150.

cmadjar · 2024-08-14T13:28:14Z

python/lib/database_lib/candidate_db.py

+        query = 'SELECT PSCID ' \
+            'FROM candidate ' \
+            'WHERE CandID = %s'


That could probably be just in one line given how short the query is.

Fine by me, done.

cmadjar

LGTM and tested locally to make sure all worked.

* remove trailing whitespaces (#1149) Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * SQLAlchemy proof-of-concept (#1152) * add sqlalchemy * fix lints * add comment --------- Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * fix (#1168) Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * integration test (#1138) * test * main Loris here * ignore Loris python * clone Loris * rm loris * override test * override test * override RB data * fix folder * test php files * test * try * test * try install loris mri * os * test * rm modules * Update flake8_python_linter.yml * Small linting improvements (#1167) * better linting * remove tab in comment --------- Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * Improve Python subject determination and validation error handling (#1146) * improve subjects determination and validation * use tuple for database query arguments * query on single line --------- Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * Add literal regex leading r (#1176) Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * fix (#1178) Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * LORIS-MRI tooling configuration (#1170) * add pyproject.toml * keep flake8 (for now) * double type checking * Use Pyright as main type checker * remove comment * test * test * change configuration * add python environment variables * try factorization * test * test * test * test * remove temporary things * test ruff error * ruff output github * further test errors * test pyright output github * test * test * test * change comment * remove test errors * test --------- Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * update docker install (#1171) Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * add RB override tables for the MRI side (#1173) * Remove php 8.1 and 8.2 from the LORIS tests to keep only test on PHP 8.3 (#1175) * remove php 8.1 and 8.2 testing * update to 8.3 in Dockerfile.test.php8 * ADD minc tools install in gitactions (#1179) * todo install loris mri * test * run pytest in gitaction (#1172) * test * ttt * test * test * test * test * test * test * test * test * test * test * test * test * test * test * test * test * test * Delete test_example.py * rename * done * Delete test/Dockerfile.test.py * test (#1187) * unit tests poc (#1188) * add more pytest global rules (#1194) * Stricter linter configuration (#1193) * stricter ruff config * do not ignore N818 * lint for sorted imports * add python scripts directory (#1196) * Bunch of SQLAlchemy definitions (#1192) * bunch of sqlalchemy definitions * add project orm definition * fix candidate to site query function * optimize docker image (#1195) * fix pytest config (#1197) * fix python scripts (#1199) * remove mysql-connector (#1200) * Improve subject configuration structure (#1150) * refacor subject config * remove suspicious phantom code * change naming * fix rebase lints * cleaner database url * Cecile CandID path join bug fix Co-authored-by: Cécile Madjar <cecile.madjar@mcin.ca> --------- Co-authored-by: Cécile Madjar <cecile.madjar@mcin.ca> * Integration test to check that the ORM definitions match the LORIS SQL database (#1198) * orm sync integration test * add docker health check * use integration config file * test error * Revert "test error" This reverts commit d1a70e6. * Clean up Docker database scripts (#1202) * clean docker database * remove docker warning * remove unused attribute (#1201) * try fix perl ci * Add file, file_parameter and scanner SQLAlchemy models (#1207) * add tables * rename file model * rename parameter_file * fix lints (#1210) * Unify LORIS-MRI logging (#1191) * new env * change print order in database log * create notification type if needed * fix typing pyright update * Integration test for `run_dicom_archive_loader.py` (#1203) add integration test put keys in github repo debugging add database check add comments and debugging remove s3 keys change converter in test check file tree fix doc fix docs 2 rename variable go back to using s3 keys fix path try fix * rename database sub-namepaces (#1214) * fix warnings (#1215) * Factorize get subject session (#1190) * factorize subject session * re-add site log * Migrate "get all sites from the database" to the SQLAlchemy abstraction (#1216) * Fix new PEP585 Ruff lint (#1218) * Clean-up database scan type (#1139) Complementary PR of LORIS #9304 * Migrate DICOM archive files to new database abstraction (#1217) * wip * use self.dicom_archive.id instead of self.tarchive_id * fix todo * Fix SQLAlchemy file model (#1221) * Improve Python requirements order (#1222) * add python documentation (#1223) * Update SQLAlchemy bindings to match LORIS#9556 (#1229) * trigger_tests * try update * fix old queries for integration tests * Migrate `ImagingUpload` and `MriUploadDB` to the new database abstraction (#1224) * migrate mri upload to sqlalchemy * fix concurrency bug * fix clean up * display mri upload ids use dicom archive id instead of tarchiveid because it is prettier ruff check * migrate scanner to new abstraction (#1232) * Fix install script (#1234) * fix install script and move files related to install into the install directory * fix install script and move files related to install into the install directory * fix install script and move files related to install into the install directory * fix paths in imaging_install_test.sh * fix paths in imaging_install_test.sh * fix SQL error in install script * fix tests * fix tests * fix github action * add cpanfile * copy cpanfile in MRI dockerfile * add new dependency in order to be able to source the Digest::BLAKE2 in the cpanfile * fix Digest::BLAKE2 installation though cpanfile * move config templates to a templates directory and requirements files into a requirements directory * remove copy * fix test * fix test * fix test * fix test * Maxime's feedback * specify python version (#1237) * fix large file hash memory (#1236) * add lib.util module (#1238) * New DICOM study import script (#1117) * rewrite dicom archive * update * remove --year and --target options * get session from config file * remove unused upload argument * fix little oopsie for --session * handle large files hash * fix unhandled modalities * use new lib.util module * Add support to be able to read headers from enhanced DICOMs (#7) * print * print * print * print * add support for enhanced DICOMs * add support for enhanced DICOMs * rework associate files with series * sort dicom series and files before inserting in the database * handle null scanner --------- Co-authored-by: Cécile Madjar <cecile.madjar@mcin.ca> * refactoring of CandID on the non-ORM python side * tests * fix insertion in mri_protocol_violation_scans * alphabetical order is hard sometimes * FIx `get_candidate_id` documentation --------- Co-authored-by: Maxime Mulder <maxime-mulder@outlook.com> Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> Co-authored-by: Shen <kongtiaowangshen@gmail.com>

improve subjects determination and validation

0106306

maximemulder requested a review from cmadjar August 8, 2024 17:23

use tuple for database query arguments

d7360f4

maximemulder commented Aug 8, 2024

View reviewed changes

cmadjar reviewed Aug 14, 2024

View reviewed changes

query on single line

016c65b

maximemulder mentioned this pull request Aug 14, 2024

Improve subject configuration structure #1150

Merged

cmadjar approved these changes Aug 30, 2024

View reviewed changes

cmadjar merged commit 3eddad6 into aces:main Aug 30, 2024
1 check passed

cmadjar added this to the 27.0 milestone Aug 30, 2024

cmadjar added Refactoring Python imaging pipeline Add to release notes Caveat for Existing Projects labels Aug 30, 2024

cmadjar added this to the 27.0 milestone Apr 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Python subject determination and validation error handling #1146

Improve Python subject determination and validation error handling #1146

Uh oh!

maximemulder commented Aug 8, 2024 •

edited by cmadjar

Loading

Uh oh!

maximemulder Aug 8, 2024

Uh oh!

cmadjar Aug 14, 2024

Uh oh!

cmadjar left a comment

Uh oh!

cmadjar Aug 14, 2024

Uh oh!

cmadjar Aug 14, 2024

Uh oh!

maximemulder Aug 14, 2024 •

edited

Loading

Uh oh!

cmadjar Aug 14, 2024

Uh oh!

maximemulder Aug 14, 2024

Uh oh!

cmadjar left a comment

Uh oh!

Uh oh!

Uh oh!

Improve Python subject determination and validation error handling #1146

Improve Python subject determination and validation error handling #1146

Uh oh!

Conversation

maximemulder commented Aug 8, 2024 • edited by cmadjar Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes for existing projects

Uh oh!

maximemulder Aug 8, 2024

Choose a reason for hiding this comment

Uh oh!

cmadjar Aug 14, 2024

Choose a reason for hiding this comment

Uh oh!

cmadjar left a comment

Choose a reason for hiding this comment

Uh oh!

cmadjar Aug 14, 2024

Choose a reason for hiding this comment

Uh oh!

cmadjar Aug 14, 2024

Choose a reason for hiding this comment

Uh oh!

maximemulder Aug 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmadjar Aug 14, 2024

Choose a reason for hiding this comment

Uh oh!

maximemulder Aug 14, 2024

Choose a reason for hiding this comment

Uh oh!

cmadjar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

maximemulder commented Aug 8, 2024 •

edited by cmadjar

Loading

maximemulder Aug 14, 2024 •

edited

Loading