Interface Design

To send a batch job (i.e., a large set of documents to be each be run through a single tool) to the Hadoop cluster, the default, as-is Curator (the "Master Curator") will make a command-line call of the following form: ./hadoop jar CuratorHadoopInterface.jar <*location_of_documents_in_hdfs*>.

In order for this to work, a few things need to be in place first:

The document collection must have been transferred to the Hadoop Distributed File System (HDFS), probably through a command-line call like this: ./hadoop dfs -copyFromLocal <*location_of_docs_on_local_machine*> <*destination_in_hdfs*>.
- The directory that is being copied in must be of the following structure:
  - <Top-Level_Directory>/
    - <Document Hash/ID>/
      - <*annotation type>.txt
      - . . .
      - <*annotation type>.txt
- For instance:
  - job_1/
    - 0956d2fbd5d5c29844a4d21ed2f76e0c/
      - srl.txt
      - chunking.txt
      - ner.txt
Each Hadoop node must have a special Curator instance, which relies only on a local instance of the annotation tool.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Interface Design

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally