Handling Dependencies Automatically

Suppose a user requests SRL output for 10,000 documents. Consulting the Dependency Tree for Annotation Tools, we see that SRL requires the following tools, in order:

Tokenizer
POS
Chunker
Charniak parser

Our tool needs to do the following:

Figure out which of the above list to run, and in which order (trivial)
Ensure that we do not copy the tool results out of Hadoop until after SRL runs
- Ideally, we indicate to the controller (script) outside the cluster that the run through the first tool is finished, so we are ready for the next job and the next tool.
- Between runs, we shut down the running tools, but we do not delete the directories.
- Ideally, we ensure that all dependencies for a given document stay in the same HDFS block (available for easy access later)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handling Dependencies Automatically

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally