-
Notifications
You must be signed in to change notification settings - Fork 52
New DICOM study import script #1117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Okay, I think the PR is finished, I am happy with the code and tested it on a few DICOMs on my side. The tests are failing because I am using a few We should also test it on DICOMs from other projects. Then, when this PR is merged, I can make another PR to adopt it in the different superscripts, although a problem will be that this script requires the Python profile file, while the scripts that call I will note a few things on the code:
|
Blocked on Python 3.10, as well as #1141 and #1142 to see if this PR should be updated with new libraries. NOTE for myself: Bring a fix from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maximemulder I read through the code and have a few comments/refactoring suggestions. Let me know your thoughts.
Note: I did not test yet, just went through the code.
Thanks for the review @cmadjar ! I wrote some comments on a few points. Points with no comments mean that I agree or will look into it. |
@maximemulder I think if you rebase this one, it will pass the flake8 now that your PR with the python install update has been merged. Thanks! Also, regarding the |
Ok I replied to all your (very good !) comments @cmadjar. The only ones I have intentionally left out are these ones :
If you like the rest of the PR, I can add these very quickly after the next imaging meeting. |
Waiting for the first test implementation to polish it. |
3503263
to
3626bcb
Compare
8fa6000
to
3c2d49d
Compare
07ba539
to
fbcc9ca
Compare
fbcc9ca
to
4fbe555
Compare
if args.insert and args.update: | ||
log_error_exit( | ||
env, | ||
"Arguments '--insert' and '--update' cannot be used both at the same time.", | ||
lib.exitcode.INVALID_ARG, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make the distinction between --insert
and --update
here? I think a simple --database option is enough.
If --overwrite is used, then, it would update, if not, it would insert after having check that there is not already an entry in the DB for the StudyUID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this to provide more options in case there are inconsistencies between the file system and the database (a file but no entry or an entry but no file). --insert
and --update
are only about the database, while --overwrite
is about the file system. I don't have a strong opinion on this but it seemed like a good idea when I did it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. Interesting. I have to think about this one. I would have a tendency to have a simple database flag and then figure out if data should be updated or inserted in the database when want to update the database instead of having to do that outside of the script in order to properly call import_dicom_study.py... I will sleep on it and get back to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hhhhm, not sure I understand your last message correctly. In any case, I am ok to have a single --update
option that both overwrites the file system and updates the entries in the database. However, the script should definitely not overwrite / update anything if the user does not specify it IMO, that is too error prone and could lead to a loss of data IMO.
@maximemulder very nice work overall!! Two summary points:
|
* print * print * print * print * add support for enhanced DICOMs * add support for enhanced DICOMs
Thanks for your feedback @cmadjar ! For your information, I tested the PR with some Rainsinbread data and some MPN data, which is acquired using a Siemens scanner, so testing this PR on DICOMs from other scanners is very valuable. When I wrote the DICOM sorting logic of this PR, I tried to closely follow the code of The As far as I understand it, the standard way to uniquely identify a DICOM series within a DICOM study is to use the SeriesUID DICOM field. This field is nullable in our database schema, but I checked and there is no
Pointers for DCMSUM.pl:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maximemulder I test the python DICOM ingestion script on:
- 6 Philips datasets
- 4 Siemens XA 30 (enhanced DICOMs)
- 2 GE datasets
All differences noted between the perl and python ingestion were improvements (ability to read enhanced DICOMs, ability to distinguish multi-echo or multi-tr DICOMs etc...).
The only missing piece at the moment is that if the DeviceSerialNumber
header has been removed from the DICOMs due to anonymization, I get the following error:
"import_dicom_study.py -p database_config_27.py -s /data/tmp/CHSTL0158_705052_V02_SessionA_1.3.12.2.1107.5.2.43.166158.30000025012107581494200000808_siemens_prisma --insert
Extracting DICOM information... (may take a long time)
Traceback (most recent call last):
File "/opt/loris/bin/mri/python/scripts/import_dicom_study.py", line 314, in <module>
main()
File "/opt/loris/bin/mri/python/scripts/import_dicom_study.py", line 147, in main
dicom_summary = get_dicom_study_summary(args.source, args.verbose)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/loris/bin/mri/python/lib/import_dicom_study/summary_get.py", line 39, in get_dicom_study_summary
study_info = get_dicom_study_info(dicom)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/loris/bin/mri/python/lib/import_dicom_study/summary_get.py", line 83, in get_dicom_study_info
read_value(dicom, 'DeviceSerialNumber'),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/loris/bin/mri/python/lib/import_dicom_study/summary_get.py", line 154, in read_value
raise Exception(f"Expected DICOM tag '{tag}' but found none.")
Exception: Expected DICOM tag 'DeviceSerialNumber' but found none."
Ideally, when reading the DICOM, should either read the content of the DICOM tag if available or set it to NULL to avoid this error
@cmadjar Hhhhm, none of the scanner fields in the |
Good catch! Looks like the current dicomTar.pl is indeed inserting with an empty string so probably best to use the same strategy here. And I think it is a good idea to apply that on all the Scanner fields, in case other anonymization software completely removes those. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job!!!
* remove trailing whitespaces (#1149) Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * SQLAlchemy proof-of-concept (#1152) * add sqlalchemy * fix lints * add comment --------- Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * fix (#1168) Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * integration test (#1138) * test * main Loris here * ignore Loris python * clone Loris * rm loris * override test * override test * override RB data * fix folder * test php files * test * try * test * try install loris mri * os * test * rm modules * Update flake8_python_linter.yml * Small linting improvements (#1167) * better linting * remove tab in comment --------- Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * Improve Python subject determination and validation error handling (#1146) * improve subjects determination and validation * use tuple for database query arguments * query on single line --------- Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * Add literal regex leading r (#1176) Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * fix (#1178) Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * LORIS-MRI tooling configuration (#1170) * add pyproject.toml * keep flake8 (for now) * double type checking * Use Pyright as main type checker * remove comment * test * test * change configuration * add python environment variables * try factorization * test * test * test * test * remove temporary things * test ruff error * ruff output github * further test errors * test pyright output github * test * test * test * change comment * remove test errors * test --------- Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * update docker install (#1171) Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> * add RB override tables for the MRI side (#1173) * Remove php 8.1 and 8.2 from the LORIS tests to keep only test on PHP 8.3 (#1175) * remove php 8.1 and 8.2 testing * update to 8.3 in Dockerfile.test.php8 * ADD minc tools install in gitactions (#1179) * todo install loris mri * test * run pytest in gitaction (#1172) * test * ttt * test * test * test * test * test * test * test * test * test * test * test * test * test * test * test * test * test * Delete test_example.py * rename * done * Delete test/Dockerfile.test.py * test (#1187) * unit tests poc (#1188) * add more pytest global rules (#1194) * Stricter linter configuration (#1193) * stricter ruff config * do not ignore N818 * lint for sorted imports * add python scripts directory (#1196) * Bunch of SQLAlchemy definitions (#1192) * bunch of sqlalchemy definitions * add project orm definition * fix candidate to site query function * optimize docker image (#1195) * fix pytest config (#1197) * fix python scripts (#1199) * remove mysql-connector (#1200) * Improve subject configuration structure (#1150) * refacor subject config * remove suspicious phantom code * change naming * fix rebase lints * cleaner database url * Cecile CandID path join bug fix Co-authored-by: Cécile Madjar <cecile.madjar@mcin.ca> --------- Co-authored-by: Cécile Madjar <cecile.madjar@mcin.ca> * Integration test to check that the ORM definitions match the LORIS SQL database (#1198) * orm sync integration test * add docker health check * use integration config file * test error * Revert "test error" This reverts commit d1a70e6. * Clean up Docker database scripts (#1202) * clean docker database * remove docker warning * remove unused attribute (#1201) * try fix perl ci * Add file, file_parameter and scanner SQLAlchemy models (#1207) * add tables * rename file model * rename parameter_file * fix lints (#1210) * Unify LORIS-MRI logging (#1191) * new env * change print order in database log * create notification type if needed * fix typing pyright update * Integration test for `run_dicom_archive_loader.py` (#1203) add integration test put keys in github repo debugging add database check add comments and debugging remove s3 keys change converter in test check file tree fix doc fix docs 2 rename variable go back to using s3 keys fix path try fix * rename database sub-namepaces (#1214) * fix warnings (#1215) * Factorize get subject session (#1190) * factorize subject session * re-add site log * Migrate "get all sites from the database" to the SQLAlchemy abstraction (#1216) * Fix new PEP585 Ruff lint (#1218) * Clean-up database scan type (#1139) Complementary PR of LORIS #9304 * Migrate DICOM archive files to new database abstraction (#1217) * wip * use self.dicom_archive.id instead of self.tarchive_id * fix todo * Fix SQLAlchemy file model (#1221) * Improve Python requirements order (#1222) * add python documentation (#1223) * Update SQLAlchemy bindings to match LORIS#9556 (#1229) * trigger_tests * try update * fix old queries for integration tests * Migrate `ImagingUpload` and `MriUploadDB` to the new database abstraction (#1224) * migrate mri upload to sqlalchemy * fix concurrency bug * fix clean up * display mri upload ids use dicom archive id instead of tarchiveid because it is prettier ruff check * migrate scanner to new abstraction (#1232) * Fix install script (#1234) * fix install script and move files related to install into the install directory * fix install script and move files related to install into the install directory * fix install script and move files related to install into the install directory * fix paths in imaging_install_test.sh * fix paths in imaging_install_test.sh * fix SQL error in install script * fix tests * fix tests * fix github action * add cpanfile * copy cpanfile in MRI dockerfile * add new dependency in order to be able to source the Digest::BLAKE2 in the cpanfile * fix Digest::BLAKE2 installation though cpanfile * move config templates to a templates directory and requirements files into a requirements directory * remove copy * fix test * fix test * fix test * fix test * Maxime's feedback * specify python version (#1237) * fix large file hash memory (#1236) * add lib.util module (#1238) * New DICOM study import script (#1117) * rewrite dicom archive * update * remove --year and --target options * get session from config file * remove unused upload argument * fix little oopsie for --session * handle large files hash * fix unhandled modalities * use new lib.util module * Add support to be able to read headers from enhanced DICOMs (#7) * print * print * print * print * add support for enhanced DICOMs * add support for enhanced DICOMs * rework associate files with series * sort dicom series and files before inserting in the database * handle null scanner --------- Co-authored-by: Cécile Madjar <cecile.madjar@mcin.ca> * refactoring of CandID on the non-ORM python side * tests * fix insertion in mri_protocol_violation_scans * alphabetical order is hard sometimes * FIx `get_candidate_id` documentation --------- Co-authored-by: Maxime Mulder <maxime-mulder@outlook.com> Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca> Co-authored-by: Shen <kongtiaowangshen@gmail.com>
Description
This PR is a rewrite in Python (+ cleaning up / refactor) of the DICOM archiving scripts in Python, with a few added features.
In more details, the changes are the following:
dicom-archive/dicomTar.pl
→python/scripts/import_dicom_study.py
dicom-archive/dicomSummary.pl
→python/scripts/summarize_dicom_study.py
dicom-archive/DICOM/DCMSUM.pm
(and a bunch of other code) →python/lib/import_dicom_study/*
This PR aims to be feature-compatible with the current Perl scripts, such that it can eventually replace it seemlessly. It removes some arguments that were unused (
-centerName
,-today
and-addMriUpload
), sorts the DICOM archive in a year subdirectory automatically (this was previously done later in the pipeline), and automatically use thetarchive
directory as a target.This PR also adds the
--session
argument to determine the DICOM archive session based on the Python configuration file (and fail if no session is found).Testing
Tested all options successfully with Raisinbread and MPN data. The data looks identical with both the old DICOM archiver and this new script.
Potential future work
tarchive
table.Notes for existing projects
pydicom
librarypip install pydicom