-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Named Entity Recognition with Spacy works great when the language is defined for a given run, since we can specify a single language-specific model. Most interesting applications are multi-lingual, however.
A "multi-lingual NER" step would involve two complications over the existing NER model.
1: Take the story language as input, and download an appropriate model for each language. (We'd probably want to manually define the models used for each language, and Use the wikimodel as a fallback). Then just use the appropriate model for each story. This is relatively simple.
2: Normalize the NER object-type slugs - some models use three-character slugs like "PER", some use expanded full words like "PERSON"- some models have full object types not present in other models, etc... This is a little more open ended, requires some investigation of what the full range of perversions are before defining exactly what normalization would look like.