Skip to content

Conversation

@mikix
Copy link
Contributor

@mikix mikix commented Sep 2, 2025

This adds the --select-by-word, -regex, -athena-table, -csv, and -anon-csv options to upload-notes, deprecating the previous --docrefs and --anon-docrefs options (which now map to by-csv and by-anon-csv). This new set of options matches the options used to select notes for NLP - so now the same exact option can be used at the other side of the pipeline when uploading to label studio.

This also removes the rarely used "download notes from your EHR on the fly" feature which meant you didn't need local copies of the notes. It could even do a bulk export if you didn't provide a --docrefs flag. It is too hard to square with the new set of options and is a lot of complexity for a feature we don't really use and don't really recommend to sites to use (you should be archiving your notes for easy access anyway).

(ed note: this was a feature I used early on, when we didn't have as healthy an archiving habit and not all the documents were even extracted yet from Cerner. But we recommend exporting via smart-fetch these days and then just holding on to the extracted files. So on-the-fly exporting is kind of an anti-pattern nowadays.)

Downloading the wrapped clinical notes referred to by your NDJSON is still supported! This is just removing the ability to go direct to your EHR for the DocRefs and DxReports themselves.

Checklist

  • Consider if documentation (like in docs/) needs to be updated
  • Consider if tests should be added

@github-actions
Copy link

github-actions bot commented Sep 2, 2025

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
4178 4136 99% 98% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
cumulus_etl/deid/init.py 100% 🟢
cumulus_etl/deid/codebook.py 100% 🟢
cumulus_etl/etl/config.py 100% 🟢
cumulus_etl/nlp/selection.py 100% 🟢
cumulus_etl/upload_notes/cli.py 100% 🟢
cumulus_etl/upload_notes/selector.py 100% 🟢
TOTAL 100% 🟢

updated for commit: 502363d by action🐍

@mikix mikix force-pushed the mikix/upload-select branch from 63152dc to a157398 Compare September 2, 2025 19:57
This adds the --select-by-word, -regex, -athena-table, -csv, and
-anon-csv options to upload-notes, deprecating the previous --docrefs
and --anon-docrefs options (which now map to by-csv and by-anon-csv).

This also removes the rarely used "download notes from your EHR on
the fly" feature which meant you didn't need local copies of the notes.
It could even do a bulk export if you didn't provide a --docrefs flag.
It is too hard to square with the new set of options and is a lot of
complexity for a feature we don't really use and don't really recommend
to sites to use (you should be archiving your notes for easy access
anyway).

Downloading the wrapped clinical notes referred to by your NDJSON is
still supported! This is just removing the ability to go direct to your
EHR for the DocRefs and DxReports themselves.
@mikix mikix force-pushed the mikix/upload-select branch from a157398 to 502363d Compare September 2, 2025 20:00
@mikix mikix merged commit 6a712e3 into main Sep 2, 2025
3 checks passed
@mikix mikix deleted the mikix/upload-select branch September 2, 2025 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants