Skip to content

Improve Python subject determination and validation error handling #1146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 30, 2024

Conversation

maximemulder
Copy link
Contributor

@maximemulder maximemulder commented Aug 8, 2024

Changes extracted from #1144.

Here are the changes:

  • Remove function validate_subject_ids in lib/imaging.py as it was dead code.
  • Add a file lib/validate_subject_ids to isolate subject IDs validation.
  • Add a directory python/lib/exception to contain exception classes.
  • Add two exceptions DetermineSubjectException and ValidateSubjectException to handle subject IDs errors, and use these exceptions where dictionary entries were previously used.
  • Simplify the PSCID / CandID combination checking by using a simpler SQL query.

Notes for existing projects

If a separate script is using the validate_subject_ids logic, you will likely need to modify related code to work with the new refactored way of determining subjects IDs. This caveat is only for users using the python imaging pipeline.

@maximemulder maximemulder requested a review from cmadjar August 8, 2024 17:23
Comment on lines 38 to 50
def get_candidate_psc_id(self, cand_id: str | int) -> str | None:
"""
Checks whether the PSCID/CandID combination corresponds to a valid candidate in the `candidate` table.

:param psc_id: PSCID of the candidate
:type psc_id: str
:param cand_id: CandID of the candidate
:type cand_id: int

:returns: the valid CandID and PSCID if the combination corresponds to a candidate, None otherwise
Return a candidate PSCID and based on its CandID, or `None` if no candidate is found in
the database.
"""

query = "SELECT c1.CandID, c2.PSCID AS PSCID " \
" FROM candidate c1 " \
" LEFT JOIN candidate c2 ON (c1.CandID=c2.CandID AND c2.PSCID = %s) " \
" WHERE c1.CandID = %s"
query = 'SELECT PSCID ' \
'FROM candidate ' \
'WHERE CandID = %s'

results = self.db.pselect(query=query, args=(psc_id, cand_id))
results = self.db.pselect(query, args=(cand_id,))

return results if results else None
return results[0]['PSCID'] if results else None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a comment in the old PR, but I still don't see how a join is necessary here. If I query the candidate by CandID, and that the PSCID of the candidate matches the PSCID of the file name, it is a sufficient proof that everying is fine to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough. That select statement is derived from the perl code actually. @nicolasbrossard any reason why that query is used instead of just selecting the PSCID based on a CandID (and later on checking that the PSCID in the filename is the same as the PSCID in the database). I have a feeling there was a reason for it but my memory fails me. Hopefully, yours is in a better shape ;)

Copy link
Collaborator

@cmadjar cmadjar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maximemulder see my comments attached. One of them is tagging Nicolas because I want to be sure I did not forget something important before we go with the simpler query to get PSCID.

Comment on lines 38 to 50
def get_candidate_psc_id(self, cand_id: str | int) -> str | None:
"""
Checks whether the PSCID/CandID combination corresponds to a valid candidate in the `candidate` table.

:param psc_id: PSCID of the candidate
:type psc_id: str
:param cand_id: CandID of the candidate
:type cand_id: int

:returns: the valid CandID and PSCID if the combination corresponds to a candidate, None otherwise
Return a candidate PSCID and based on its CandID, or `None` if no candidate is found in
the database.
"""

query = "SELECT c1.CandID, c2.PSCID AS PSCID " \
" FROM candidate c1 " \
" LEFT JOIN candidate c2 ON (c1.CandID=c2.CandID AND c2.PSCID = %s) " \
" WHERE c1.CandID = %s"
query = 'SELECT PSCID ' \
'FROM candidate ' \
'WHERE CandID = %s'

results = self.db.pselect(query=query, args=(psc_id, cand_id))
results = self.db.pselect(query, args=(cand_id,))

return results if results else None
return results[0]['PSCID'] if results else None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough. That select statement is derived from the perl code actually. @nicolasbrossard any reason why that query is used instead of just selecting the PSCID based on a CandID (and later on checking that the PSCID in the filename is the same as the PSCID in the database). I have a feeling there was a reason for it but my memory fails me. Hopefully, yours is in a better shape ;)


try:
subject_id_dict = self.config_file.get_subject_ids(self.db, dicom_value, scanner_id)
subject_id_dict['PatientName'] = dicom_value
subject_id_dict = self.config_file.get_subject_ids(self.db, subject_name, scanner_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we should check also that the subject_id_dict from the config file is returning the proper dictionary structure. AKA a key for PSCID, a key for CandID a key for visitLabel, a key of isPhantom and a key for createVisitLabel. Since the scripts relies on that structure, it could be nice to double that the config file follows that structure.

Copy link
Contributor Author

@maximemulder maximemulder Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I AGREE TO THE FULLEST EXTENT. This is actually the next PR that I wanted to discuss. A dictionary is not the right data structure here, because we know exactly what fields we want to be filled. I was thinking of using specific dataclasses that contain exactly the fields we need. This means that current projects need to rewrite their Python configuration but it is a very straightforward change to make and makes this configuration easier and clearer IMO.

EDIT: See #1150.

Comment on lines 44 to 46
query = 'SELECT PSCID ' \
'FROM candidate ' \
'WHERE CandID = %s'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could probably be just in one line given how short the query is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me, done.

Copy link
Collaborator

@cmadjar cmadjar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and tested locally to make sure all worked.

@cmadjar cmadjar merged commit 3eddad6 into aces:main Aug 30, 2024
1 check passed
@cmadjar cmadjar added this to the 27.0 milestone Aug 30, 2024
cmadjar added a commit that referenced this pull request Mar 21, 2025
* remove trailing whitespaces (#1149)

Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>

* SQLAlchemy proof-of-concept (#1152)

* add sqlalchemy

* fix lints

* add comment

---------

Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>

* fix (#1168)

Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>

* integration test (#1138)

* test

* main Loris here

* ignore Loris python

* clone Loris

* rm loris

* override test

* override test

* override RB data

* fix folder

* test php files

* test

* try

* test

* try install loris mri

* os

* test

* rm modules

* Update flake8_python_linter.yml

* Small linting improvements (#1167)

* better linting

* remove tab in comment

---------

Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>

* Improve Python subject determination and validation error handling (#1146)

* improve subjects determination and validation

* use tuple for database query arguments

* query on single line

---------

Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>

* Add literal regex leading r (#1176)

Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>

* fix (#1178)

Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>

* LORIS-MRI tooling configuration (#1170)

* add pyproject.toml

* keep flake8 (for now)

* double type checking

* Use Pyright as main type checker

* remove comment

* test

* test

* change configuration

* add python environment variables

* try factorization

* test

* test

* test

* test

* remove temporary things

* test ruff error

* ruff output github

* further test errors

* test pyright output github

* test

* test

* test

* change comment

* remove test errors

* test

---------

Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>

* update docker install (#1171)

Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>

* add RB override tables for the MRI side (#1173)

* Remove php 8.1 and 8.2 from the LORIS tests to keep only test on PHP 8.3 (#1175)

* remove php 8.1 and 8.2 testing

* update to 8.3 in Dockerfile.test.php8

* ADD minc tools install in gitactions (#1179)

* todo install loris mri

* test

* run pytest in gitaction (#1172)

* test

* ttt

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* Delete test_example.py

* rename

* done

* Delete test/Dockerfile.test.py

* test (#1187)

* unit tests poc (#1188)

* add more pytest global rules (#1194)

* Stricter linter configuration (#1193)

* stricter ruff config

* do not ignore N818

* lint for sorted imports

* add python scripts directory (#1196)

* Bunch of SQLAlchemy definitions (#1192)

* bunch of sqlalchemy definitions

* add project orm definition

* fix candidate to site query function

* optimize docker image (#1195)

* fix pytest config (#1197)

* fix python scripts (#1199)

* remove mysql-connector (#1200)

* Improve subject configuration structure (#1150)

* refacor subject config

* remove suspicious phantom code

* change naming

* fix rebase lints

* cleaner database url

* Cecile CandID path join bug fix

Co-authored-by: Cécile Madjar <cecile.madjar@mcin.ca>

---------

Co-authored-by: Cécile Madjar <cecile.madjar@mcin.ca>

* Integration test to check that the ORM definitions match the LORIS SQL database (#1198)

* orm sync integration test

* add docker health check

* use integration config file

* test error

* Revert "test error"

This reverts commit d1a70e6.

* Clean up Docker database scripts (#1202)

* clean docker database

* remove docker warning

* remove unused attribute (#1201)

* try fix perl ci

* Add file, file_parameter and scanner SQLAlchemy models (#1207)

* add tables

* rename file model

* rename parameter_file

* fix lints (#1210)

* Unify LORIS-MRI logging (#1191)

* new env

* change print order in database log

* create notification type if needed

* fix typing pyright update

* Integration test for `run_dicom_archive_loader.py` (#1203)

add integration test

put keys in github repo

debugging

add database check

add comments and debugging

remove s3 keys

change converter in test

check file tree

fix doc

fix docs 2

rename variable

go back to using s3 keys

fix path

try fix

* rename database sub-namepaces (#1214)

* fix warnings (#1215)

* Factorize get subject session (#1190)

* factorize subject session

* re-add site log

* Migrate "get all sites from the database" to the SQLAlchemy abstraction (#1216)

* Fix new PEP585 Ruff lint (#1218)

* Clean-up database scan type (#1139)

Complementary PR of LORIS #9304

* Migrate DICOM archive files to new database abstraction (#1217)

* wip

* use self.dicom_archive.id instead of self.tarchive_id

* fix todo

* Fix SQLAlchemy file model (#1221)

* Improve Python requirements order (#1222)

* add python documentation (#1223)

* Update SQLAlchemy bindings to match LORIS#9556 (#1229)

* trigger_tests

* try update

* fix old queries for integration tests

* Migrate `ImagingUpload` and `MriUploadDB` to the new database abstraction (#1224)

* migrate mri upload to sqlalchemy

* fix concurrency bug

* fix clean up

* display mri upload ids

use dicom archive id instead of tarchiveid because it is prettier

ruff check

* migrate scanner to new abstraction (#1232)

* Fix install script (#1234)

* fix install script and move files related to install into the install directory

* fix install script and move files related to install into the install directory

* fix install script and move files related to install into the install directory

* fix paths in imaging_install_test.sh

* fix paths in imaging_install_test.sh

* fix SQL error in install script

* fix tests

* fix tests

* fix github action

* add cpanfile

* copy cpanfile in MRI dockerfile

* add new dependency in order to be able to source the Digest::BLAKE2 in the cpanfile

* fix Digest::BLAKE2 installation though cpanfile

* move config templates to a templates directory and requirements files into a requirements directory

* remove copy

* fix test

* fix test

* fix test

* fix test

* Maxime's feedback

* specify python version (#1237)

* fix large file hash memory (#1236)

* add lib.util module (#1238)

* New DICOM study import script (#1117)

* rewrite dicom archive

* update

* remove --year and --target options

* get session from config file

* remove unused upload argument

* fix little oopsie for --session

* handle large files hash

* fix unhandled modalities

* use new lib.util module

* Add support to be able to read headers from enhanced DICOMs  (#7)

* print

* print

* print

* print

* add support for enhanced DICOMs

* add support for enhanced DICOMs

* rework associate files with series

* sort dicom series and files before inserting in the database

* handle null scanner

---------

Co-authored-by: Cécile Madjar <cecile.madjar@mcin.ca>

* refactoring of CandID on the non-ORM python side

* tests

* fix insertion in mri_protocol_violation_scans

* alphabetical order is hard sometimes

* FIx `get_candidate_id` documentation

---------

Co-authored-by: Maxime Mulder <maxime-mulder@outlook.com>
Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>
Co-authored-by: Shen <kongtiaowangshen@gmail.com>
@cmadjar cmadjar added this to the 27.0 milestone Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants