Skip to content

Code for the JCDL paper "Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Competitive Performance to Full-Text"

License

Notifications You must be signed in to change notification settings

florianmai/Quadflor

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Competitive Performance to Full-Text

This repository contains the code for the JCDL paper Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Competitive Performance to Full-Text. It is based on and extents the multi-label classification framework Quadflor.

Installation

Install Python 3.4 or higher and

#install necessary packages
sudo apt-get install libatlas-base-dev gfortran python3.4-dev python3.4-venv build-essential

#install python modules in a virtual environment with pip (this may take a while):
python3 -m venv lucid_ml_environment
source lucid_ml_environment/bin/activate
cd Code
pip install -r requirements.txt

Replicating the results

In order to enhance the reproducability of our study, we uploaded a copy of the title datasets to Kaggle. Moreover, we provide the configurations used to produce the results from the paper.

To rerun any of the (title) experiments, do the following:

  1. Download the econbiz.csv and pubmed.csv files, respectively, and copy them to the folder Resources.
  2. Open the .cfg file of the respective method that you want to run (MLP, BaseMLP, CNN, or LSTM) from the Experiments folder. Copy the command in the third (if you want to evaluate on a single fold) or fifth (if you want to do a full 10-fold-cross-validation) line.
  3. In the command, adjust the parameter for the option --tf-model-path parameter (specifies where to save the weights of the models, which can be gigabytes, so make sure you have enough disk space), and the --pretrained_embeddings parameter to the location of the GloVe model in your environment.
  4. cd to the folder Code/lucid_ml and run the command.

About

Code for the JCDL paper "Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Competitive Performance to Full-Text"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.2%
  • Makefile 2.5%
  • Batchfile 2.3%