Curator Modifications

Bird's-Eye view of these modifications

Create Master mode with the following responsibilities:
- Sets up document collection in Hadoop Distributed File System (HDFS).
- Launches local-mode Curators and associated annotation tools on all Hadoop nodes.
- Sends batch job to Hadoop cluster (i.e., starts HadoopInterface.java with the proper parameters).
- Waits for error messages from the annotation tools, and logs them in a user-actionable way.
Create local mode with the following responsibilities:
- Interfaces with exactly one annotation tool, as specified by the Master Curator.
- Assumes all dependencies for all documents are present in HDFS, and skips those documents which do not meet the requirements.
- Logs errors from the annotation tools in a user-actionable way.

Master Curator Mode for Hadoop

Here's what the Master Curator needs to do, along with thoughts on how to do it:

Launch
- Specify that configuration comes from curator.hadoop.master.properties (for example)
Decide what tool will be run on all documents
- Where is this specified?
Launch the local Curator with that annotation tool on all Hadoop nodes
- Run shell script that "knows" the location of all Hadoop nodes?
- Defer work on this (probably) until we actually have access to a Hadoop cluster
Wait for confirmation from those nodes that their tools are up and running
- Pass message over network?
Figure out (parse?) what documents and annotations will be sent to Hadoop
- Where does this input come from?
Transfer those documents, with their prerequisite annotations
- Initiate scp (or equivalent) transfer to Hadoop master (namenode?)
Send job to Hadoop
- Pass message over network to job tracker?
Wait for the job to finish
- How do we know when it finishes?
Copy data out from HDFS
- Initiate scp or equivalent transfer from Hadoop master

Local/Slave Curator Mode for Hadoop

Here's what each local Curator (running on each Hadoop node) needs to do, along with thoughts on how to do it:

Launch
- Gets launched by the Master Curator
- Bundled with the launch command is a note about which annotation tool we should launch (?)
- Launch that tool in whatever the standard way to do so locally is
- Scaffolding (during design): we can launch this by hand if we want to work on this code before moving on to the MC
Wait for required tool to finish launching, then give the OK to Master Curator
- How do we know it's ready?
- How do we pass a message back to the MC?
- Scaffolding (during design): we can skip giving the OK until we're ready to code the MC
(MapReduce job launches outside of local Curator)
- Was launched by the jobtracker after getting a message from the MC
- Scaffolding (during design): Simulate job submission to a locally-running version of Hadoop
Wait for input from a local map() operation
- map() will copy the data to be processed to the user directory (~/document_hash_here/)
- map() will add a .lock file to that directory while it is still writing to it
When input is received and there is no lock on the input directory:
1. Lock the input directory (i.e., create a .lock file)
2. Prepare the job to send to the tool
  - Build a edu.illinois.cs.cogcomp.thrift.curator.Record structure?
3. Send the job to the annotation tool
  - How does this work? Is it similar to client.provide() in CuratorDemo.java?
4. Write the output to the local disk
  - Once we know how to send a job to the tool, this should be easy.
5. Unlock (i.e., delete the .lock file)
(MapReduce job will handle transfer of the output back to the Master Curator once the .lock file is gone)

An actually useful to-do list for making this happen

Figure out how to launch a locally-running Curator with a single annotation tool (probably from the command line)
Figure out how to send a job (programmatically) to the annotation tool
Figure out where to modify the local-mode Curator code to check for the input directory (described in Step 4 above)
Figure out how output is returned (programmatically) from annotation tools

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Curator Modifications

Bird's-Eye view of these modifications

Master Curator Mode for Hadoop

Local/Slave Curator Mode for Hadoop

An actually useful to-do list for making this happen

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally