Developed and tested on Windows 10 inside a venv
environment using Python 3.7.7 and pip 19.2.3
Setup your environment by installing the requirements using pip.
pip install -r requirements.txt
Copy the config.example file to config.py
. A database with dummy data is provided here, place the file in /Database/miner_database.db
. Data used in the research is available upon request (contact: a.j.vanaltena@amsterdamumc.nl) or may be collected from PubMed using the qrel
files from the 2017 CLEF eHealth Lab. Follow the steps below to perform the experiments.
- Clean the raw articles
python clean_articles.py
- Build the feature matrices
python create_feature_matrices.py
- Do grid searches
python Grid_search/leaveoneout/rf_random_search.py
python Grid_search/onevsone/rf_random_search.py
Note: the results of the grid searches are placed in a csv file in the Grid_search/leaveoneout/
and Grid_search/onevsone/
directories respectively.
Create a folder with the name of the experiment run and edit the CLASSIFIER_LOCATION
in the config.py file. The config.example file uses the foldername run1
.
- Run the classifiers
python run_leaveoneout.py
python run_onevsone.py
python run_nvsone.py
python run_nvsone_random.py
# Fetch timing difference results for two training set sizes
python run_nvsone_timing.py
- Interpret the outcomes
Note: for correlations calculation a metadata file is necessary. You may find this file for the fifty reviews used in our research here. For testing purposes we also provide a dummy set.
python make_plots.py
python calculate_correlations.py
- When writing paper
python prepare_metadata.py