Project Overview

Here's how the project looks from 30,000 feet:

Overview of the interactions facilitated by the Curator-Hadoop interface

Description

The JobHandler (which handles all the shell scripts in the front end) analyzes the text files that the user passed in as input and determines the dependencies that need to be satisfied in order to get the annotation requested by the user. Then, for each annotation that is required:

The JobHandler calls the shell script to copy input to Hadoop. This results in . . .
Having the "master" (i.e., locally running) CuratorClient create serialized Records from the user's input. These are the input files after preliminary processing.
The same shell script (copy_input_to_hadoop.sh) sends those serialized records to Hadoop. The JobHandler then calls launch_hadoop_job.sh and has the Hadoop Job Handler start running our HadoopInterface program on each of the nodes in the cluster.
After a bit of Hadoop back-end wizardry, each node in the cluster reaches the Reduce phase. There, it launches a Curator and the required annotator on that node, and launches the HadoopCuratorClient to interface with them.
The input Records are annotated using the required annotator.
The newly annotated Records are stored in the Hadoop Distributed File System (HDFS) as serialized records.
After all Reduce phases finish, the JobHandler runs the copy_output_from_hadoop.sh script and copies the output back to the local disk. Once those serialized records finish copying, it will de-serialize them and have the local ("master") Curator store the updates in its database cache.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Project Overview

Description

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally