-
Notifications
You must be signed in to change notification settings - Fork 52
Improve Python subject determination and validation error handling #1146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Python subject determination and validation error handling #1146
Conversation
def get_candidate_psc_id(self, cand_id: str | int) -> str | None: | ||
""" | ||
Checks whether the PSCID/CandID combination corresponds to a valid candidate in the `candidate` table. | ||
|
||
:param psc_id: PSCID of the candidate | ||
:type psc_id: str | ||
:param cand_id: CandID of the candidate | ||
:type cand_id: int | ||
|
||
:returns: the valid CandID and PSCID if the combination corresponds to a candidate, None otherwise | ||
Return a candidate PSCID and based on its CandID, or `None` if no candidate is found in | ||
the database. | ||
""" | ||
|
||
query = "SELECT c1.CandID, c2.PSCID AS PSCID " \ | ||
" FROM candidate c1 " \ | ||
" LEFT JOIN candidate c2 ON (c1.CandID=c2.CandID AND c2.PSCID = %s) " \ | ||
" WHERE c1.CandID = %s" | ||
query = 'SELECT PSCID ' \ | ||
'FROM candidate ' \ | ||
'WHERE CandID = %s' | ||
|
||
results = self.db.pselect(query=query, args=(psc_id, cand_id)) | ||
results = self.db.pselect(query, args=(cand_id,)) | ||
|
||
return results if results else None | ||
return results[0]['PSCID'] if results else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a comment in the old PR, but I still don't see how a join is necessary here. If I query the candidate by CandID, and that the PSCID of the candidate matches the PSCID of the file name, it is a sufficient proof that everying is fine to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough. That select statement is derived from the perl code actually. @nicolasbrossard any reason why that query is used instead of just selecting the PSCID based on a CandID (and later on checking that the PSCID in the filename is the same as the PSCID in the database). I have a feeling there was a reason for it but my memory fails me. Hopefully, yours is in a better shape ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maximemulder see my comments attached. One of them is tagging Nicolas because I want to be sure I did not forget something important before we go with the simpler query to get PSCID.
def get_candidate_psc_id(self, cand_id: str | int) -> str | None: | ||
""" | ||
Checks whether the PSCID/CandID combination corresponds to a valid candidate in the `candidate` table. | ||
|
||
:param psc_id: PSCID of the candidate | ||
:type psc_id: str | ||
:param cand_id: CandID of the candidate | ||
:type cand_id: int | ||
|
||
:returns: the valid CandID and PSCID if the combination corresponds to a candidate, None otherwise | ||
Return a candidate PSCID and based on its CandID, or `None` if no candidate is found in | ||
the database. | ||
""" | ||
|
||
query = "SELECT c1.CandID, c2.PSCID AS PSCID " \ | ||
" FROM candidate c1 " \ | ||
" LEFT JOIN candidate c2 ON (c1.CandID=c2.CandID AND c2.PSCID = %s) " \ | ||
" WHERE c1.CandID = %s" | ||
query = 'SELECT PSCID ' \ | ||
'FROM candidate ' \ | ||
'WHERE CandID = %s' | ||
|
||
results = self.db.pselect(query=query, args=(psc_id, cand_id)) | ||
results = self.db.pselect(query, args=(cand_id,)) | ||
|
||
return results if results else None | ||
return results[0]['PSCID'] if results else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough. That select statement is derived from the perl code actually. @nicolasbrossard any reason why that query is used instead of just selecting the PSCID based on a CandID (and later on checking that the PSCID in the filename is the same as the PSCID in the database). I have a feeling there was a reason for it but my memory fails me. Hopefully, yours is in a better shape ;)
|
||
try: | ||
subject_id_dict = self.config_file.get_subject_ids(self.db, dicom_value, scanner_id) | ||
subject_id_dict['PatientName'] = dicom_value | ||
subject_id_dict = self.config_file.get_subject_ids(self.db, subject_name, scanner_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we should check also that the subject_id_dict
from the config file is returning the proper dictionary structure. AKA a key for PSCID
, a key for CandID
a key for visitLabel
, a key of isPhantom
and a key for createVisitLabel
. Since the scripts relies on that structure, it could be nice to double that the config file follows that structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I AGREE TO THE FULLEST EXTENT. This is actually the next PR that I wanted to discuss. A dictionary is not the right data structure here, because we know exactly what fields we want to be filled. I was thinking of using specific dataclasses that contain exactly the fields we need. This means that current projects need to rewrite their Python configuration but it is a very straightforward change to make and makes this configuration easier and clearer IMO.
EDIT: See #1150.
query = 'SELECT PSCID ' \ | ||
'FROM candidate ' \ | ||
'WHERE CandID = %s' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could probably be just in one line given how short the query is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and tested locally to make sure all worked.
* remove trailing whitespaces (#1149) Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * SQLAlchemy proof-of-concept (#1152) * add sqlalchemy * fix lints * add comment --------- Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * fix (#1168) Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * integration test (#1138) * test * main Loris here * ignore Loris python * clone Loris * rm loris * override test * override test * override RB data * fix folder * test php files * test * try * test * try install loris mri * os * test * rm modules * Update flake8_python_linter.yml * Small linting improvements (#1167) * better linting * remove tab in comment --------- Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * Improve Python subject determination and validation error handling (#1146) * improve subjects determination and validation * use tuple for database query arguments * query on single line --------- Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * Add literal regex leading r (#1176) Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * fix (#1178) Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * LORIS-MRI tooling configuration (#1170) * add pyproject.toml * keep flake8 (for now) * double type checking * Use Pyright as main type checker * remove comment * test * test * change configuration * add python environment variables * try factorization * test * test * test * test * remove temporary things * test ruff error * ruff output github * further test errors * test pyright output github * test * test * test * change comment * remove test errors * test --------- Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * update docker install (#1171) Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * add RB override tables for the MRI side (#1173) * Remove php 8.1 and 8.2 from the LORIS tests to keep only test on PHP 8.3 (#1175) * remove php 8.1 and 8.2 testing * update to 8.3 in Dockerfile.test.php8 * ADD minc tools install in gitactions (#1179) * todo install loris mri * test * run pytest in gitaction (#1172) * test * ttt * test * test * test * test * test * test * test * test * test * test * test * test * test * test * test * test * test * Delete test_example.py * rename * done * Delete test/Dockerfile.test.py * test (#1187) * unit tests poc (#1188) * add more pytest global rules (#1194) * Stricter linter configuration (#1193) * stricter ruff config * do not ignore N818 * lint for sorted imports * add python scripts directory (#1196) * Bunch of SQLAlchemy definitions (#1192) * bunch of sqlalchemy definitions * add project orm definition * fix candidate to site query function * optimize docker image (#1195) * fix pytest config (#1197) * fix python scripts (#1199) * remove mysql-connector (#1200) * Improve subject configuration structure (#1150) * refacor subject config * remove suspicious phantom code * change naming * fix rebase lints * cleaner database url * Cecile CandID path join bug fix Co-authored-by: Cécile Madjar <cecile.madjar@mcin.ca> --------- Co-authored-by: Cécile Madjar <cecile.madjar@mcin.ca> * Integration test to check that the ORM definitions match the LORIS SQL database (#1198) * orm sync integration test * add docker health check * use integration config file * test error * Revert "test error" This reverts commit d1a70e6. * Clean up Docker database scripts (#1202) * clean docker database * remove docker warning * remove unused attribute (#1201) * try fix perl ci * Add file, file_parameter and scanner SQLAlchemy models (#1207) * add tables * rename file model * rename parameter_file * fix lints (#1210) * Unify LORIS-MRI logging (#1191) * new env * change print order in database log * create notification type if needed * fix typing pyright update * Integration test for `run_dicom_archive_loader.py` (#1203) add integration test put keys in github repo debugging add database check add comments and debugging remove s3 keys change converter in test check file tree fix doc fix docs 2 rename variable go back to using s3 keys fix path try fix * rename database sub-namepaces (#1214) * fix warnings (#1215) * Factorize get subject session (#1190) * factorize subject session * re-add site log * Migrate "get all sites from the database" to the SQLAlchemy abstraction (#1216) * Fix new PEP585 Ruff lint (#1218) * Clean-up database scan type (#1139) Complementary PR of LORIS #9304 * Migrate DICOM archive files to new database abstraction (#1217) * wip * use self.dicom_archive.id instead of self.tarchive_id * fix todo * Fix SQLAlchemy file model (#1221) * Improve Python requirements order (#1222) * add python documentation (#1223) * Update SQLAlchemy bindings to match LORIS#9556 (#1229) * trigger_tests * try update * fix old queries for integration tests * Migrate `ImagingUpload` and `MriUploadDB` to the new database abstraction (#1224) * migrate mri upload to sqlalchemy * fix concurrency bug * fix clean up * display mri upload ids use dicom archive id instead of tarchiveid because it is prettier ruff check * migrate scanner to new abstraction (#1232) * Fix install script (#1234) * fix install script and move files related to install into the install directory * fix install script and move files related to install into the install directory * fix install script and move files related to install into the install directory * fix paths in imaging_install_test.sh * fix paths in imaging_install_test.sh * fix SQL error in install script * fix tests * fix tests * fix github action * add cpanfile * copy cpanfile in MRI dockerfile * add new dependency in order to be able to source the Digest::BLAKE2 in the cpanfile * fix Digest::BLAKE2 installation though cpanfile * move config templates to a templates directory and requirements files into a requirements directory * remove copy * fix test * fix test * fix test * fix test * Maxime's feedback * specify python version (#1237) * fix large file hash memory (#1236) * add lib.util module (#1238) * New DICOM study import script (#1117) * rewrite dicom archive * update * remove --year and --target options * get session from config file * remove unused upload argument * fix little oopsie for --session * handle large files hash * fix unhandled modalities * use new lib.util module * Add support to be able to read headers from enhanced DICOMs (#7) * print * print * print * print * add support for enhanced DICOMs * add support for enhanced DICOMs * rework associate files with series * sort dicom series and files before inserting in the database * handle null scanner --------- Co-authored-by: Cécile Madjar <cecile.madjar@mcin.ca> * refactoring of CandID on the non-ORM python side * tests * fix insertion in mri_protocol_violation_scans * alphabetical order is hard sometimes * FIx `get_candidate_id` documentation --------- Co-authored-by: Maxime Mulder <maxime-mulder@outlook.com> Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> Co-authored-by: Shen <kongtiaowangshen@gmail.com>
Changes extracted from #1144.
Here are the changes:
validate_subject_ids
inlib/imaging.py
as it was dead code.lib/validate_subject_ids
to isolate subject IDs validation.python/lib/exception
to contain exception classes.DetermineSubjectException
andValidateSubjectException
to handle subject IDs errors, and use these exceptions where dictionary entries were previously used.Notes for existing projects
If a separate script is using the
validate_subject_ids
logic, you will likely need to modify related code to work with the new refactored way of determining subjects IDs. This caveat is only for users using the python imaging pipeline.