This repository was archived by the owner on Feb 1, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Curator Modifications
s3cur3 edited this page Jun 27, 2012
·
20 revisions
- Create Master mode with the following responsibilities:
- Sets up document collection in Hadoop Distributed File System (HDFS).
- Launches local-mode Curators and associated annotation tools on all Hadoop nodes.
- Sends batch job to Hadoop cluster (i.e., starts HadoopInterface.java with the proper parameters).
- Waits for error messages from the annotation tools, and logs them in a user-actionable way.
- Create local mode with the following responsibilities:
- Interfaces with exactly one annotation tool, as specified by the Master Curator.
- Assumes all dependencies for all documents are present in HDFS, and skips those documents which do not meet the requirements.
- Logs errors from the annotation tools in a user-actionable way.
Here's what the Master Curator needs to do, along with thoughts on how to do it:
- Launch
- Decide what tool will be run on all documents
- Launch the local Curator with that annotation tool on all Hadoop nodes
- Wait for confirmation from those nodes that their tools are up and running
- Figure out what documents will be sent to Hadoop
- Transfer those documents, with their prerequisite annotations
- Send job to Hadoop
- Wait for the job to finish
- Copy data out from HDFS
Here's what each local Curator (running on each Hadoop node) needs to do, along with thoughts on how to do it:
- Launch
- Wait for required tool to finish launching, then give the OK to Master Curator
- (MapReduce job launches outside of local Curator)
- Wait for input from a local map() operation
- When input is received and there is no lock on the input directory:
- Lock the input directory
- Prepare the job to send to the tool
- Send the job to the annotation tool
- Write the output to the local disk
- Unlock
- (MapRecuce job will handle transfer of the output back to the Master Curator