This repository was archived by the owner on Feb 1, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Curator Reducer and Curator Client UML Diagram
s3cur3 edited this page Aug 5, 2012
·
2 revisions
These are the classes directly responsible for sending documents to the Curator for annotation. The workflow, such as it is, goes like this:
- Set up the inputs to the MapReduce job and start the job running (done by the classes described on the page Infrastructure UML Diagram).
- Hadoop works its behind-the-scenes MapReduce magic (assigning jobs to the nodes in the cluster closest to the data, etc.).
- Each node involved in the Reduce operation (which, for large jobs on an empty cluster, should be all nodes in the cluster) creates a CuratorReducer and calls its
reduce()
function. - The CuratorReducer launches the Curator and annotator(s) as necessary on the node.
- The CuratorReducer creates a HadoopCuratorClient and asks it to annotate a single document.
- The HadoopCuratorClient requests the annotation from the Curator running on the node.
- The HadoopCuratorClient serializes the (newly annotated) record it gets back from the Curator to the output directory that was specified when the user launched the HadoopInterface.
- The CuratorReducer indicates to Hadoop that this node has finished its annotation.