Skip to content

NER Language Improvements  #23

@pgulley

Description

@pgulley

Named Entity Recognition with Spacy works great when the language is defined for a given run, since we can specify a single language-specific model. Most interesting applications are multi-lingual, however.

A "multi-lingual NER" step would involve two complications over the existing NER model.
1: Take the story language as input, and download an appropriate model for each language. (We'd probably want to manually define the models used for each language, and Use the wikimodel as a fallback). Then just use the appropriate model for each story. This is relatively simple.
2: Normalize the NER object-type slugs - some models use three-character slugs like "PER", some use expanded full words like "PERSON"- some models have full object types not present in other models, etc... This is a little more open ended, requires some investigation of what the full range of perversions are before defining exactly what normalization would look like.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions