NLP Clinical Vignettes

Author: Camelia Ciolac

In this project:

I developed a cognitive service workflow to transform raw text of clinical vignettes into structured medical data

Starting from raw text in a PDF file, the result is a content-rich PDF (in folder data/vignettes-reports) with:

medical entities highlighted over text,
health issues categorized into current or historic or absent or family-related
medication summary, including its purpose (in relation to which disease) as well as pointers to DrugBank and RxNorm knowledge bases
alerts about drug-drug interactions between the medication mentioned in the medical case

In this implementation I used:

pretrained BNER models from SciSpacy
Matcher and DependencyMatcher of Spacy with custom patterns concerning medication and diseases
MedSpacy for matching custom rules involving disease entities

Moreover, technologies of Tika, PDFKit, PyJQ, SPARQL were employed.

This project thus joins the set of cognitive services applying NLP to clinical data, like Amazon Comprehend Medical, IBM Watson Annotator for Clinical Data, Azure Text Analytics for Health, etc.

The RENKU platform was successfully employed to record both the artifacts of the project and how these resulted from the workflow stages.

The README_TECHNICAL.md details all steps performed and, together with the notebook Renku_KG.ipynb, it highlights the many benefits of using the RENKU platform for data science.

Projects with a complex graph of processing stages benefit from this platform as it allows elegant provenance analysis through its lineage functionality.
Along with MLOps, the platform also provides access to its Knowledge Graph, thus enabling data science projects to be organized semantically and their metadata to be governed for synergies.

I want to acknowledge the data sources, without which this project wouldn't have had the same contents:

The DrugBank Open Data ( https://go.drugbank.com/releases/latest#open-data )
The Bio2RDF Virtuoso server ( https://drugbank.bio2rdf.org/sparql ) provided me the opportunity to discover drug-drug interactions by writing SPARQL queries.
The book "Interesting Clinical Vignettes: 101 Ice Breakers for Medical Rounds" by the team at Texas Tech University Health Sciences Center ( https://www.ttuhsc.edu/clinical-research/vignettes.aspx ) from which I extracted the clinical vignettes

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
notebooks		notebooks
src/py_scripts		src/py_scripts
.gitattributes		.gitattributes
Dockerfile		Dockerfile
Medical_NLP_landscape.png		Medical_NLP_landscape.png
README.md		README.md
README_TECHNICAL.md		README_TECHNICAL.md
environment.yml		environment.yml
img_INPUT_OUTPUT_p208.png		img_INPUT_OUTPUT_p208.png
img_PROVENANCE.png		img_PROVENANCE.png
img_WORKFLOW_SKETCH.png		img_WORKFLOW_SKETCH.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NLP Clinical Vignettes

Author: Camelia Ciolac

About

Uh oh!

Languages

camelia-c/nlp-clinical-vignettes

Folders and files

Latest commit

History

Repository files navigation

NLP Clinical Vignettes

Author: Camelia Ciolac

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages