This repository was archived by the owner on Feb 1, 2022. It is now read-only.

Curator Reducer and Curator Client UML Diagram

Jump to bottom

s3cur3 edited this page Aug 5, 2012 · 2 revisions

UML Diagram: Relationship between CuratorReducer and HadoopCuratorClient

Discussion of the Classes

These are the classes directly responsible for sending documents to the Curator for annotation. The workflow, such as it is, goes like this:

Set up the inputs to the MapReduce job and start the job running (done by the classes described on the page Infrastructure UML Diagram).
Hadoop works its behind-the-scenes MapReduce magic (assigning jobs to the nodes in the cluster closest to the data, etc.).
Each node involved in the Reduce operation (which, for large jobs on an empty cluster, should be all nodes in the cluster) creates a CuratorReducer and calls its reduce() function.
The CuratorReducer launches the Curator and annotator(s) as necessary on the node.
The CuratorReducer creates a HadoopCuratorClient and asks it to annotate a single document.
1. The HadoopCuratorClient requests the annotation from the Curator running on the node.
2. The HadoopCuratorClient serializes the (newly annotated) record it gets back from the Curator to the output directory that was specified when the user launched the HadoopInterface.
The CuratorReducer indicates to Hadoop that this node has finished its annotation.