v0.1.0-alpha1
kcworks-nlp-tools
is a an experimental package of NLP applications for Knowledge Commons Works.
kcworks-nlp-tools
is released as free software under the MIT license.
Configuration variables are set in the config.py
file. Available config values are:
Variable name | type | description |
---|---|---|
DOWNLOADED_FILES_PATH | str | Path where downloaded files will be stored |
OUTPUT_FILES_PATH | str | Path where output files will be saved |
API_ENDPOINT | str | API endpoint for records ("records") |
API_URL | str | Base URL for the KC Works API (defaults to "https://works.hcommons.org") |
BATCH_SIZE | int | Number of records to process in each api request (defaults to 10) |
CORPUS_SIZE | int | Total number of records to process (defaults to 100) |
CHUNK_SIZE | int | Size of text chunks for processing (defaults to 400) |
EXTRACTED_TEXT_CSV_PATH | Path | Path to CSV file containing extracted text |
KCWORKS_API_KEY | str | API key for KC Works authentication |
PREPROCESSED_PATH | Path | Path to CSV file containing preprocessed text |
TIKA_SERVER_ENDPOINT | str | URL for Tika server (defaults to "http://localhost:9998") |
A few required environment variables must be provided in a .env
file placed at the top level of this project folder (i.e., the same folder that contains this README file). These variables must include:
Variable name | Description |
---|---|
KCWORKS_API_KEY | A valid oauth token for the KCWorks api |
Initial work on this package was done by Tianyi (Titi) Kou-Herrema as a graduate assistant for Knowledge Commons with supervision by Ian Scott and Stephanie E. Vasko.