Skip to content

SQLAlchemy proof-of-concept #1152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 28, 2024
Merged

Conversation

maximemulder
Copy link
Contributor

@maximemulder maximemulder commented Aug 20, 2024

This PR introduces SQLAlchemy in LORIS-MRI, which will be discussed at the next imaging meeting. It builds on #1151, so diff against that branch. It does not contain queries yet, only some definition files for a few tables, but if the proposal is accepted there are a few queries I will want to convert fast !

Caveat for existing projects

In release 27.* of LORIS-MRI, we will support both the SQLAlchemy and the old direct SQL way of querying the database but as of 28.* of LORIS-MRI, all libraries under database_lib and the library database.py will be removed so we encourage you to update your code accordingly.

Ensure you install SQLAlchemy>=2.0.0 after upgrading the code using pip install sqlalchemy>=2.0.0 or pip upgrade sqlalchemy if SQLAlchemy is already listed in your pip packages but not using version >=2.0.0

@maximemulder
Copy link
Contributor Author

maximemulder commented Aug 23, 2024

OKAY, so I finally ran some performance tests for SQLAlchemy, for that I used DICOM archiving, which is a script in which most time is taken by file compression, but which still does a good number of SQL queries, notably a lot of small INSERT and SELECT (one of each for each DICOM file insertion). I tested the three following versions, thrice each, which use the same compression method and got the following results :

* I ignore the first run because the files were not in the hard drive cache at this point.

The results can vary by a few seconds on each run, nevertheless, there is a 1s slowdown in the SQLAlchemy runs compared to the raw SQL ones, but it still faster than our current script. Those are certainly very broad results so I can do more granular benchmarks if that is really really wanted.

RAW results

# dicomTar.pl

(loris-mri-python) lorisadmin@mmulder-dev:/data/tmp/dicom_archive_time_test$ ls
DCM_2018-02-02_ImagingUpload-14-29-gopIio.tar  ImagingUpload-14-29-gopIio  ImagingUpload-14-29-gopIio.meta  ImagingUpload-14-29-gopIio.tar.gz
(loris-mri-python) lorisadmin@mmulder-dev:/data/tmp/dicom_archive_time_test$ time dicomTar.pl -profile prod -database ImagingUpload-14-29-gopIio /data/raisinbread/tarchive

real	0m39.428s
user	0m25.191s
sys	0m2.302s
(loris-mri-python) lorisadmin@mmulder-dev:/data/tmp/dicom_archive_time_test$ rm /data/raisinbread/tarchive/DCM_2018-02-02_ImagingUpload-14-29-gopIio.tar
(loris-mri-python) lorisadmin@mmulder-dev:/data/tmp/dicom_archive_time_test$ time dicomTar.pl -profile prod -database ImagingUpload-14-29-gopIio /data/raisinbread/tarchive

real	0m21.624s
user	0m14.812s
sys	0m1.279s
(loris-mri-python) lorisadmin@mmulder-dev:/data/tmp/dicom_archive_time_test$ rm /data/raisinbread/tarchive/DCM_2018-02-02_ImagingUpload-14-29-gopIio.tar

# dicom_archive.py raw SQL

(loris-mri-python) lorisadmin@mmulder-dev:/data/tmp/dicom_archive_time_test$ time dicomTar.pl -profile prod -database ImagingUpload-14-29-gopIio /data/raisinbread/tarchive

real	0m20.432s
user	0m13.298s
sys	0m1.041s
(loris-mri-python) lorisadmin@mmulder-dev:/data/tmp/dicom_archive_time_test$ rm /data/raisinbread/tarchive/DCM_2018-02-02_ImagingUpload-14-29-gopIio.tar
(loris-mri-python) lorisadmin@mmulder-dev:/data/tmp/dicom_archive_time_test$ time dicom_archive.py -p database_config.py --db-insert -s ImagingUpload-14-29-gopIio -t /data/raisinbread/tarchive --overwrite
Extracting DICOM information (may take a long time)
Checking database presence
Copying into DICOM tar
Calculating DICOM tar MD5 sum
Zipping DICOM tar (may take a long time)
Calculating DICOM zip MD5 sum
Getting DICOM scan date
Writing summary file
Writing log file
Copying into DICOM archive
Removing temporary files
Calculating DICOM tar MD5 sum
Success

real	0m19.061s
user	0m13.564s
sys	0m1.275s
(loris-mri-python) lorisadmin@mmulder-dev:/data/tmp/dicom_archive_time_test$ rm /data/raisinbread/tarchive/DCM_2018-02-02_ImagingUpload-14-29-gopIio.tar
(loris-mri-python) lorisadmin@mmulder-dev:/data/tmp/dicom_archive_time_test$ time dicom_archive.py -p database_config.py --db-insert -s ImagingUpload-14-29-gopIio -t /data/raisinbread/tarchive --overwrite
Extracting DICOM information (may take a long time)
Checking database presence
Copying into DICOM tar
Calculating DICOM tar MD5 sum
Zipping DICOM tar (may take a long time)
Calculating DICOM zip MD5 sum
Getting DICOM scan date
Writing summary file
Writing log file
Copying into DICOM archive
Removing temporary files
Calculating DICOM tar MD5 sum
Success

real	0m17.187s
user	0m12.576s
sys	0m1.082s
(loris-mri-python) lorisadmin@mmulder-dev:/data/tmp/dicom_archive_time_test$ rm /data/raisinbread/tarchive/DCM_2018-02-02_ImagingUpload-14-29-gopIio.tar
(loris-mri-python) lorisadmin@mmulder-dev:/data/tmp/dicom_archive_time_test$ time dicom_archive.py -p database_config.py --db-insert -s ImagingUpload-14-29-gopIio -t /data/raisinbread/tarchive --overwrite
Extracting DICOM information (may take a long time)
Checking database presence
Copying into DICOM tar
Calculating DICOM tar MD5 sum
Zipping DICOM tar (may take a long time)
Calculating DICOM zip MD5 sum
Getting DICOM scan date
Writing summary file
Writing log file
Copying into DICOM archive
Removing temporary files
Calculating DICOM tar MD5 sum
Success

real	0m16.868s
user	0m12.201s
sys	0m1.045s

# dicom_archive.py SQLAlchemy

(loris-mri-python) lorisadmin@mmulder-dev:/data/tmp/dicom_archive_time_test$ time dicom_archive.py --profile database_config.py --overwrite --db-insert ImagingUpload-14-29-gopIio /data/raisinbread/tarchive
Extracting DICOM information (may take a long time)
Checking database presence
Copying into DICOM tar
Calculating DICOM tar MD5 sum
Zipping DICOM tar (may take a long time)
Calculating DICOM zip MD5 sum
Getting DICOM scan date
WARNING: Overwriting '/data/raisinbread/tarchive/DCM_2018-02-02_ImagingUpload-14-29-gopIio.tar'
Writing summary file
Writing log file
Copying into DICOM archive
Removing temporary files
Calculating DICOM tar MD5 sum
Inserting DICOM archive in the database
Success

real	0m20.104s
user	0m14.976s
sys	0m1.464s
(loris-mri-python) lorisadmin@mmulder-dev:/data/tmp/dicom_archive_time_test$ time dicom_archive.py --profile database_config.py --overwrite --db-insert ImagingUpload-14-29-gopIio /data/raisinbread/tarchive
Extracting DICOM information (may take a long time)
Checking database presence
Copying into DICOM tar
Calculating DICOM tar MD5 sum
Zipping DICOM tar (may take a long time)
Calculating DICOM zip MD5 sum
Getting DICOM scan date
WARNING: Overwriting '/data/raisinbread/tarchive/DCM_2018-02-02_ImagingUpload-14-29-gopIio.tar'
Writing summary file
Writing log file
Copying into DICOM archive
Removing temporary files
Calculating DICOM tar MD5 sum
Inserting DICOM archive in the database
Success

real	0m17.987s
user	0m13.187s
sys	0m1.181s
(loris-mri-python) lorisadmin@mmulder-dev:/data/tmp/dicom_archive_time_test$ time dicom_archive.py --profile database_config.py --overwrite --db-insert ImagingUpload-14-29-gopIio /data/raisinbread/tarchive
Extracting DICOM information (may take a long time)
Checking database presence
Copying into DICOM tar
Calculating DICOM tar MD5 sum
Zipping DICOM tar (may take a long time)
Calculating DICOM zip MD5 sum
Getting DICOM scan date
WARNING: Overwriting '/data/raisinbread/tarchive/DCM_2018-02-02_ImagingUpload-14-29-gopIio.tar'
Writing summary file
Writing log file
Copying into DICOM archive
Removing temporary files
Calculating DICOM tar MD5 sum
Inserting DICOM archive in the database
Success

real	0m18.841s
user	0m14.098s
sys	0m1.213s
(loris-mri-python) lorisadmin@mmulder-dev:/data/tmp/dicom_archive_time_test$ 

@maximemulder
Copy link
Contributor Author

maximemulder commented Aug 23, 2024

DO NOT MERGE YET, I'll change the database namespace to db, it will be done in 10min.
EDIT : Done !

@maximemulder maximemulder force-pushed the 2024-08-13_sqlalchemy-poc branch from f65fc6b to f43ef19 Compare August 23, 2024 19:12
@maximemulder
Copy link
Contributor Author

Yay ! The PR is ready to be merged 🥳. Obviously SQLAlchemy is not called in the code yet, but it does define the namespace, and a few classes I want to use in the new DICOM archive PR, which I think is a good beginning now that the decision was taken. I also took the liberty or re-ordering the requirements to be alphabetical.

@maximemulder maximemulder marked this pull request as ready for review August 23, 2024 19:19
Copy link
Collaborator

@cmadjar cmadjar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maximemulder I understand that dicom_archive makes more sense than tarchive but I would stick to the table names in the ORM since I highly doubt those tables are going to be renamed.

Also, dicom_archive_file should be tarchive_files (missing the s) as there is an s at the end of the table.

@maximemulder
Copy link
Contributor Author

maximemulder commented Aug 23, 2024

@maximemulder I understand that dicom_archive makes more sense than tarchive but I would stick to the table names in the ORM since I highly doubt those tables are going to be renamed.

Also, dicom_archive_file should be tarchive_files (missing the s) as there is an s at the end of the table.

@cmadjar I have a very high preference for not doing these changes. Given our messy database, I think one of the advantages of using an ORM is to abstract some of this messiness away to have a more consistent and coherent codebase without changing the schema itself. I obviously do not want to hide structural details like the association between phantoms and candidates, but I think we should go for more coherent and consistent naming where we can.

  • tarchive* -> DbDicomArchive* : As you said, this name makes a lot more sense. tarchive (tar + archive) does not give a lot of information, whereas dicom_archive does tell us that these tables, and the code that uses them, are related to DICOMs.
  • dicom_archive_files -> DbDicomFile : We require singular table names in our SQL modeling standard, I think this is also the good practice for ORM classes (since one ORM object represents one database record). IMO singular / plural renamings are so simple that I do not even consider them renamings, to me it's just like changing the casing of a name.

Other than clearer code (IMO), I think there are a few arguments to be made on why these occasional renamings are a good idea :

  • These renamings are small and uncontroversial. For instance, I think everyone at LORIS agrees that tarchive and psc are bad names, and that dicom_archive and site (or center, I want to discuss that in the next LORIS meeting) are better names.
  • The changes are easily lookable. They can all be looked up in their ORM class files, which can be accessed at any point by using the "Go to type definition" in an IDE (magic of typed code !).
  • There is a precedent in our codebase. For instance, lib/dicom_archive.py is currently used to interface with the tarchive tables, and the database file for sites is not named lib/database_lib/psc.py but lib/database_lib/site.py.
  • If you want to, I can also add comments in each class indicating the list of significant changes in an ORM class compared to the "real" schema ("significant" meaning not casing or singular / plural).*
  • Finally, with typed code, which an ORM such as SQLAlchemy enables, renaming variables or attributes is very easy, so a renaming can always be changed if we change our mind on some name.

* BTW, I also added a comment on the Base class after your question last week. Thanks for the review it was definitely very relevant.

@maximemulder
Copy link
Contributor Author

maximemulder commented Aug 23, 2024

Okay, if anyone wants to see SQLAlchemy in use in the codebase, I have this branch for which I will make a PR sometime soon.

maximemulder/Loris-MRI@2024-08-08_refactor-subject-dict...MaximeMulder:Loris-MRI:2024-08-26_sqlalchemy-subjects

@cmadjar cmadjar merged commit 0dff931 into aces:main Aug 28, 2024
1 check passed
@cmadjar cmadjar added this to the 27.0 milestone Aug 30, 2024
@cmadjar cmadjar added the Area: ORM PR or issue related to the SQLAlchemy integration label Oct 4, 2024
cmadjar added a commit that referenced this pull request Mar 21, 2025
* remove trailing whitespaces (#1149)

Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>

* SQLAlchemy proof-of-concept (#1152)

* add sqlalchemy

* fix lints

* add comment

---------

Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>

* fix (#1168)

Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>

* integration test (#1138)

* test

* main Loris here

* ignore Loris python

* clone Loris

* rm loris

* override test

* override test

* override RB data

* fix folder

* test php files

* test

* try

* test

* try install loris mri

* os

* test

* rm modules

* Update flake8_python_linter.yml

* Small linting improvements (#1167)

* better linting

* remove tab in comment

---------

Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>

* Improve Python subject determination and validation error handling (#1146)

* improve subjects determination and validation

* use tuple for database query arguments

* query on single line

---------

Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>

* Add literal regex leading r (#1176)

Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>

* fix (#1178)

Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>

* LORIS-MRI tooling configuration (#1170)

* add pyproject.toml

* keep flake8 (for now)

* double type checking

* Use Pyright as main type checker

* remove comment

* test

* test

* change configuration

* add python environment variables

* try factorization

* test

* test

* test

* test

* remove temporary things

* test ruff error

* ruff output github

* further test errors

* test pyright output github

* test

* test

* test

* change comment

* remove test errors

* test

---------

Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>

* update docker install (#1171)

Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>

* add RB override tables for the MRI side (#1173)

* Remove php 8.1 and 8.2 from the LORIS tests to keep only test on PHP 8.3 (#1175)

* remove php 8.1 and 8.2 testing

* update to 8.3 in Dockerfile.test.php8

* ADD minc tools install in gitactions (#1179)

* todo install loris mri

* test

* run pytest in gitaction (#1172)

* test

* ttt

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* Delete test_example.py

* rename

* done

* Delete test/Dockerfile.test.py

* test (#1187)

* unit tests poc (#1188)

* add more pytest global rules (#1194)

* Stricter linter configuration (#1193)

* stricter ruff config

* do not ignore N818

* lint for sorted imports

* add python scripts directory (#1196)

* Bunch of SQLAlchemy definitions (#1192)

* bunch of sqlalchemy definitions

* add project orm definition

* fix candidate to site query function

* optimize docker image (#1195)

* fix pytest config (#1197)

* fix python scripts (#1199)

* remove mysql-connector (#1200)

* Improve subject configuration structure (#1150)

* refacor subject config

* remove suspicious phantom code

* change naming

* fix rebase lints

* cleaner database url

* Cecile CandID path join bug fix

Co-authored-by: Cécile Madjar <cecile.madjar@mcin.ca>

---------

Co-authored-by: Cécile Madjar <cecile.madjar@mcin.ca>

* Integration test to check that the ORM definitions match the LORIS SQL database (#1198)

* orm sync integration test

* add docker health check

* use integration config file

* test error

* Revert "test error"

This reverts commit d1a70e6.

* Clean up Docker database scripts (#1202)

* clean docker database

* remove docker warning

* remove unused attribute (#1201)

* try fix perl ci

* Add file, file_parameter and scanner SQLAlchemy models (#1207)

* add tables

* rename file model

* rename parameter_file

* fix lints (#1210)

* Unify LORIS-MRI logging (#1191)

* new env

* change print order in database log

* create notification type if needed

* fix typing pyright update

* Integration test for `run_dicom_archive_loader.py` (#1203)

add integration test

put keys in github repo

debugging

add database check

add comments and debugging

remove s3 keys

change converter in test

check file tree

fix doc

fix docs 2

rename variable

go back to using s3 keys

fix path

try fix

* rename database sub-namepaces (#1214)

* fix warnings (#1215)

* Factorize get subject session (#1190)

* factorize subject session

* re-add site log

* Migrate "get all sites from the database" to the SQLAlchemy abstraction (#1216)

* Fix new PEP585 Ruff lint (#1218)

* Clean-up database scan type (#1139)

Complementary PR of LORIS #9304

* Migrate DICOM archive files to new database abstraction (#1217)

* wip

* use self.dicom_archive.id instead of self.tarchive_id

* fix todo

* Fix SQLAlchemy file model (#1221)

* Improve Python requirements order (#1222)

* add python documentation (#1223)

* Update SQLAlchemy bindings to match LORIS#9556 (#1229)

* trigger_tests

* try update

* fix old queries for integration tests

* Migrate `ImagingUpload` and `MriUploadDB` to the new database abstraction (#1224)

* migrate mri upload to sqlalchemy

* fix concurrency bug

* fix clean up

* display mri upload ids

use dicom archive id instead of tarchiveid because it is prettier

ruff check

* migrate scanner to new abstraction (#1232)

* Fix install script (#1234)

* fix install script and move files related to install into the install directory

* fix install script and move files related to install into the install directory

* fix install script and move files related to install into the install directory

* fix paths in imaging_install_test.sh

* fix paths in imaging_install_test.sh

* fix SQL error in install script

* fix tests

* fix tests

* fix github action

* add cpanfile

* copy cpanfile in MRI dockerfile

* add new dependency in order to be able to source the Digest::BLAKE2 in the cpanfile

* fix Digest::BLAKE2 installation though cpanfile

* move config templates to a templates directory and requirements files into a requirements directory

* remove copy

* fix test

* fix test

* fix test

* fix test

* Maxime's feedback

* specify python version (#1237)

* fix large file hash memory (#1236)

* add lib.util module (#1238)

* New DICOM study import script (#1117)

* rewrite dicom archive

* update

* remove --year and --target options

* get session from config file

* remove unused upload argument

* fix little oopsie for --session

* handle large files hash

* fix unhandled modalities

* use new lib.util module

* Add support to be able to read headers from enhanced DICOMs  (#7)

* print

* print

* print

* print

* add support for enhanced DICOMs

* add support for enhanced DICOMs

* rework associate files with series

* sort dicom series and files before inserting in the database

* handle null scanner

---------

Co-authored-by: Cécile Madjar <cecile.madjar@mcin.ca>

* refactoring of CandID on the non-ORM python side

* tests

* fix insertion in mri_protocol_violation_scans

* alphabetical order is hard sometimes

* FIx `get_candidate_id` documentation

---------

Co-authored-by: Maxime Mulder <maxime-mulder@outlook.com>
Co-authored-by: Maxime Mulder <maxime.mulder@mcgill.ca>
Co-authored-by: Shen <kongtiaowangshen@gmail.com>
@cmadjar cmadjar added this to the 27.0 milestone Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants