Skip to content

Lemmatization improvements #1

@lefnire

Description

@lefnire

I have CleanText.keywords_fast() and .keywords_accurate(), using lemminflect & Stanford NLP (spacy-stanza) respectively. I'm not confident on the setups, I feel I could be using both tools more effectively. Especially, I have a crap-ton of custom regex in the methods above, which I assume could be handled via nlp.pipe() more elegantly / robustly.

I'll start this thread for things as I think up.

  • Do I need pip install -U spacy-lookups-data for lemmatization? (screenshot)
  • Does current use of Lemminflect bypass ^, and not need that installation? Which produces better lemmas? (Do some testing b/w spacy-lookups, lemminflect, stanza)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions