-
Notifications
You must be signed in to change notification settings - Fork 52
Validate subject IDs early in the imaging pipeline #1144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate subject IDs early in the imaging pipeline #1144
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few comments.
The main change would be that we need to still insert the DICOMs in the imaging uploader and mark the insertion as failed to keep a trace on the frontend. I know that for CBIGR, you don't want to insert but most studies would still want to see the failure.
A tradeoff would be to create a config under the imaging pipeline section with default to always insert in imaging uploader that you can turn off for CBIGR.
my $upload_id = undef; | ||
my ($debug, $verbose) = (0,1); | ||
my $upload_id = undef; | ||
my ($debug, $verbose) = (0,0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why has this changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, maybe I should have proposed this change in a separate PR indeed.
The current line makes the imaging pipeline always verbose, no matter if the --verbose
argument is provided or not. When calling the Python script from Perl, it becomes very noticeable since all database queries are logged.
if (system($command) != 0) { | ||
# The error is already printed by the Python script, just exit | ||
exit $NeuroDB::ExitCodes::CANDIDATE_MISMATCH; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would fail silently. I think the data should still be inserted in the imaging uploader but with a failure status instead.
query = 'SELECT PSCID ' \ | ||
'FROM candidate ' \ | ||
'WHERE CandID = %s' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should revert back to the old SQL. There is a reason why there are those joins. It allows checking the the PSCID and DCCID refers to the same candidate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I forgot that the PSCID is not unique, my mistake !
EDIT: Actually, even with duplicate PSCIDs, I don't see how the new code could fail. It seems to me that this join is kinda useless :
SELECT c2.PSCID
FROM candidate c1
LEFT JOIN candidate c2 ON (c1.CandID=c2.CandID AND c2.PSCID = %s)
WHERE c1.CandID = %s;
seems equivalent to me to (omitting NULL
results and arguments order):
SELECT PSCID
FROM candidate
WHERE CandID = %s AND PSCID = %s;
results = self.db.pselect(query=query, args=(visit_label,)) | ||
|
||
return results if results else None | ||
results = self.db.pselect(query=query, args=[visit_label]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would break since pselect args expects a tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code worked fine (on my machine™) with the lists and I saw a few places that use lists (in lib/candidate.py
) so I went with it. But it seems that even the MySQL docs uses tuples so you're probably right that I should go with it yes. (although I must say I do not get why one would design this API with tuples over lists here)
|
||
return results if results else None | ||
results = self.db.pselect(query=query, args=[visit_label]) | ||
return bool(results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked that changing the return value type does not break any other code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep ! The function only has one reference in the codebase (the place I changed).
except DetermineSubjectException as exception: | ||
self.log_error_and_exit( | ||
self.subject_id_dict['error_message'], | ||
exception.message, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice the exception refactoring! Have you checked that determine_subject_ids is not used anywhere else? To make sure nothing gets broken by this refactoring. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep ! I did change the error management in all the places this function is called. There is just one place which had no error management at all, so I did not add any:
https://github.com/aces/Loris-MRI/blob/main/python/lib/dcm2bids_imaging_pipeline_lib/nifti_insertion_pipeline.py#L84
This sounds reasonable to me, and I wonder if it is not what C-BIG should be doing the same. @ridz1208 do you have an opinion ? I understand we want to avoid For manual insertions just reporting the error is fine, but manual insertions would rather use the front-end. For automated scripts having the errors directly accessible by the admins in LORIS seems like a sensible solution to me. |
return subject_id_dict | ||
|
||
def validate_subject_ids(self, subject_id_dict): | ||
def determine_subject_ids_from_name(self, subject_name: str, scanner_id: Optional[int] = None) -> dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated function name. See line 515
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, there is no duplicate name, on line 515 there is determine_subject_ids
, and on 538 there is determine_subject_ids_from_name
, the former calls the latter. The diff is a bit messy here.
A few suggestions that came up (but we had not setteled on any) were:
new: Briefly my thoughts are that even for CBIG if NULL sessionID is not accepted we need a way to notify the scan "owners" that it failed to upload so solutions 1,2 and/or 3 will need to be implemented at least for CBIG even if 4 is implemented. As I explained previously my issue with NULL sessionIDs is technical not procedural. The problem is that a NULL sessionID means that we were not able to match the "patient_name" to a session for a candidate because of whatever reason... soooo on the front end who sees it?
Unless I'm missing something here thats my view of the issue. We can certainly discuss at the meeting. Edit: And just to make sure the ONLY usecase where a sessionID would be nulll is if the file name is incorrect, we are not talking of a failed upload for any other reason, just if the naming convention is not respected which the front end would have rejected either way |
Closed in favor of using a new script that calls the DICOM upload API to insert scans. |
This PR validates the subject IDs at the very beginning of the imaging pipeline, preventing the insertion of any MRI upload / DICOM archive that would not be associated to a session because of a typo or configuration mistake.
This PR uses the Python configuration function
get_subject_ids
, even in the Perl code (it calls the Python code from the Perl code). It also includes a factorization and de-duplication of some existing Python code regarding the obtentation and validation of subject IDs.Caveats for projects