Skip to content
This repository was archived by the owner on Feb 1, 2022. It is now read-only.

Curator Modifications

s3cur3 edited this page Jun 27, 2012 · 20 revisions

Bird's-Eye view of these modifications

  • Create Master mode with the following responsibilities:
    • Sets up document collection in Hadoop Distributed File System (HDFS).
    • Launches local-mode Curators and associated annotation tools on all Hadoop nodes.
    • Sends batch job to Hadoop cluster (i.e., starts HadoopInterface.java with the proper parameters).
    • Waits for error messages from the annotation tools, and logs them in a user-actionable way.
  • Create local mode with the following responsibilities:
    • Interfaces with exactly one annotation tool, as specified by the Master Curator.
    • Assumes all dependencies for all documents are present in HDFS, and skips those documents which do not meet the requirements.
    • Logs errors from the annotation tools in a user-actionable way.

Master Curator Mode for Hadoop

Here's what the Master Curator needs to do, along with thoughts on how to do it:

  1. Launch
  2. Decide what tool will be run on all documents
  3. Launch the local Curator with that annotation tool on all Hadoop nodes
  4. Wait for confirmation from those nodes that their tools are up and running
  5. Figure out what documents will be sent to Hadoop
  6. Transfer those documents, with their prerequisite annotations
  7. Send job to Hadoop
  8. Wait for the job to finish
  9. Copy data out from HDFS

Local/Slave Curator Mode for Hadoop

Here's what each local Curator (running on each Hadoop node) needs to do, along with thoughts on how to do it:

  1. Launch
  2. Wait for required tool to finish launching, then give the OK to Master Curator
  3. (MapReduce job launches outside of local Curator)
  4. Wait for input from a local map() operation
  5. When input is received and there is no lock on the input directory:
    1. Lock the input directory
    2. Prepare the job to send to the tool
    3. Send the job to the annotation tool
    4. Write the output to the local disk
    5. Unlock
  6. (MapRecuce job will handle transfer of the output back to the Master Curator
Clone this wiki locally