ClinicalConversations

This data repository contains, text obtained by manually transcribing and translating clinical conversation between physicians and patients/guardians. The data was collected from 2 hospitals in Pakistan. In order to preserve the privacy of the participants, only the anonymized and processed data is available.

Some of the ways to execute the associated script are as follows:

Generate Sequence

This command will read the data from the Data/raw directory, and produce sequences from it in the matching/transcripts directory.

every file (with extension txt and csv) will be parsed in the directory specified with -i
-s is used to generate the sequence files only
Get the output at matching/transcripts/{transcriptfile}.csv

python3 extractSentencesFromTranscripts.py -i Data/raw/ -s

Classify instances with MASS (training data)

python3 extractSentencesFromTranscripts.py -i Data/TestData/transcript* -p Data/TrainingData/transcript_annotated_7.csv

-i can also be used to pass a specific file
-p is used to provide the training file.
all lines starting with '*' will be used to create MASS at run time
The model used here is Fine-Tuned DistilBERT base uncased, which is trained on a portion of un-anonymized data from these conversations.
Alternatively, all-mpnet or other sentence similarity models can be used by updating this script and the library at lib/SemanticSentenceMatcher.py

python3 extractSentencesFromTranscripts.py -i Data/TestData/transcript* -p Data/TrainingData/transcript_annotated_7.csv -m 0.87 -x

-m provides a minimum similarity threshold
-x is used to select one or more instances with highest similarity. This means, if two instances have a similarity of 0.9, which is the highest (above the specified threshold of 0.87), only these two will be selected.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Data		Data
lib		lib
README.md		README.md
extractSentencesFromTranscripts.py		extractSentencesFromTranscripts.py
sequenceClassification.py		sequenceClassification.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ClinicalConversations

About

Uh oh!

Releases

Packages

Uh oh!

Languages

desertzebra/ClinicalConversations

Folders and files

Latest commit

History

Repository files navigation

ClinicalConversations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages