Skip to content

Conversation

@mikix
Copy link
Contributor

@mikix mikix commented Aug 15, 2025

Right now, there isn't much difference between the "etl" and "nlp" subcommands, except that:

  • all FHIR resource type tasks must use "etl"
  • all NLP tasks must use "nlp"
  • completion tracking isn't used for "nlp"

In the future, we'll add more NLP-specific options to the "nlp" subcommand, like document selection args and eventually, some sort of dynamic NLP config options. But for right now, this is just a refactor and splitting out those two subcommands.

Checklist

  • Consider if documentation (like in docs/) needs to be updated
  • Consider if tests should be added

Right now, there isn't much difference between the "etl" and "nlp"
subcommands, except that:
- all FHIR resource type tasks must use "etl"
- all NLP tasks must use "nlp"
- completion tracking isn't used for "nlp"

In the future, we'll add more NLP-specific options to the "nlp"
subcommand, like document selection args and eventually, some sort
of dynamic NLP config options. But for right now, this is just a
refactor and splitting out those two subcommands.
@mikix mikix force-pushed the mikix/add-nlp-command branch from 4f12dd1 to eb3b75e Compare August 15, 2025 14:07
@github-actions
Copy link

github-actions bot commented Aug 15, 2025

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
4066 4024 99% 98% 🟢

New Files

File Coverage Status
cumulus_etl/etl/nlp/init.py 100% 🟢
cumulus_etl/etl/nlp/cli.py 100% 🟢
cumulus_etl/etl/pipeline.py 100% 🟢
TOTAL 100% 🟢

Modified Files

File Coverage Status
cumulus_etl/cli.py 100% 🟢
cumulus_etl/cli_utils.py 100% 🟢
cumulus_etl/errors.py 100% 🟢
cumulus_etl/etl/cli.py 100% 🟢
cumulus_etl/etl/studies/irae/irae_tasks.py 100% 🟢
cumulus_etl/etl/tasks/base.py 100% 🟢
cumulus_etl/etl/tasks/task_factory.py 100% 🟢
TOTAL 100% 🟢

updated for commit: af757c1 by action🐍

@mikix mikix marked this pull request as ready for review August 15, 2025 14:58
@@ -0,0 +1,30 @@
"""
Similar to a normal ETL task, but with an extra NLP focus.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it might? be nice to specify a little more how NLP focus differs (i.e. no deid) for a future reader

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this below that line:

Some differences:
- Runs only the NLP targeted tasks
- No completion tracking
- No bulk de-identification (i.e. no MS tool)
- Has NLP specific arguments

@mikix mikix merged commit 2291efc into main Aug 18, 2025
3 checks passed
@mikix mikix deleted the mikix/add-nlp-command branch August 18, 2025 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants