-
Notifications
You must be signed in to change notification settings - Fork 195
Information Extraction in Karma
This page presents the design to support information extraction in Karma. Suppose the user loads a worksheet that contains a column with text data, such as the biographies of artists.
###Invoking Information Extraction
To extract entities, the user invokes the Information Extraction command:
This command creates a JSON document containing the input data for the Information Extraction service.
The JSON document consists of an array of objects.
Each object has a rowHash
attribute, a Karma generated hash id for the worksheet row containing the text and a text
attribute, which contains the text where we want to run extraction:
[
{
rowHash: "5f0266c4c326",
text: "... Berninghaus attended the Saint Louis School of Fine Arts at night. ..."
}
,
{
rowHash: "c326b9a1ef9e",
text: "Paris was where the 20th century was. ... attributed to Gertrude Stein prove apocryphal ..."
}
,
{
rowHash: "1ef9e39cb78c3",
text: "The daughter of a furniture manufacturer, Anni Albers (Fleischmann) was born in Berlin. ..."
}
]
###Information Extraction Service The Information Extraction Service is a REST service that accepts POST requests to perform information extraction. The body of the POST request is a JSON document such as the one listed above.
TBD: service arguments to control what it extracts and other aspects of its behavior.
####Synchronous Behavior
In synchronous mode, the service performs information extraction on the data POSTed to it and returns a new JSON document as shown below.
The document has an array containing an object for each of the objects POSTed to it.
The rowHash
is the key to relate the results to the input.
The extractions
object contains an attribute for each type of entity extracted.
For each type of entity, there is an array containing all the extractions:
[
{
rowHash: "5f0266c4c326",
extractions:
{
people:
[
{ extraction: "Berninghaus", score: 1.0 },
{ extraction: "Rober Florez", score: 0.9 }
]
,
places:
[
{ extraction: "Saint Louis School of Fine Arts", score: 1.0 }
]
,
dates:
[
{ extraction: "1873", score: 1.0 }
]
}
}
,
{
rowHash: "c326b9a1ef9e",
extractions:
{
people:
[
{ extraction: "Robert Stein", score: 1.0 }
]
,
places:
[
{ extraction: "Paris", score: 1.0 },
{ extraction: "Stockholm", score: 1.0 }
]
,
dates:
[
{ extraction: "20th century", score: 0.7 },
{ extraction: "1921", score: 1.0 },
]
}
}
,
{
rowHash: "1ef9e39cb78c3",
extractions:
{
people:
[
{ extraction: "Anni Albers", score: 1.0 },
{ extraction: "Fleischmann", score: 0.9 },
{ extraction: "Rosenthal", score: 0.8 }
]
,
places:
[
{ extraction: "New York", score: 1.0 }
]
,
dates:
[
{ extraction: "1932", score: 1.0 }
]
}
}
]
####Asynchronous Behavior In the asynchronous behavior, to be implemented in a later phase, the service returns immediately and schedules an extraction process that will record the extractions in a database where Karma can get them later.
###Receiving the Results in Karma