Skip to content
This repository was archived by the owner on Feb 1, 2022. It is now read-only.

Curator Web Annotation Service

keilexandra edited this page Jul 18, 2012 · 7 revisions

Two separate possibilities exist for how large amounts of data processed through the Curator might be made available to the larger NLP community. We would need:

  1. For user-submitted data returned with annotations:
  • A separate downloadable reader, easily configurable by the user to parse documents into Thrift-serializable text files.
  • A simple web page interface, implemented in PHP, which permits the user to upload a tarball of properly read document records.
  • A message sent to the Master Curator telling it to launch a MapReduce job(s) via shell scripts.
  1. For pre-processed annotations upon request:
  • A dynamic web page interface cataloging all annotations presently available for download. This could be accomplished through an existing framework such as CakePHP or Ruby on Rails, but then factor in the additional learning curve.
  • Alternatively, a static web page interface listing common corpora and their available annotated records.
  • A way to get specific Records from the Master Curator's database. These records could be returned in a Thrift-serialized format, which then must be run through a user-configured reader.
  • Alternatively, annotated data could be stored outside of the existing Curator database, perhaps hosted online.
Clone this wiki locally