You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Feb 1, 2022. It is now read-only.
Suppose a user requests SRL output for 10,000 documents. Consulting the Dependency Tree for Annotation Tools, we see that SRL requires the following tools, in order:
Tokenizer
POS
Chunker
Charniak parser
Our tool needs to do the following:
Figure out which of the above list to run, and in which order (trivial)
Ensure that we do not copy the tool results out of Hadoop until after SRL runs
Ideally, we indicate to the controller (script) outside the cluster that the run through the first tool is finished, so we are ready for the next job and the next tool.
Between runs, we shut down the running tools, but we do not delete the directories.
Ideally, we ensure that all dependencies for a given document stay in the same HDFS block (available for easy access later)