You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I started an NLP project where I needed high accuracy sentence segmentation, and therefore decided to use stanza.
I was thrilled to find this library, since Spacy is quite intuitive. However, I found that the sentence segmentation only gets carried into spacy under certain conditions.
Baseline:
The baseline text is to use the Stanza model alone to see if the sentence segmentation works.
This is the simplest model that I could use, I simply turned on the tokenize processor.
Test with Spacy-Stanza:
I then tried the same thing, but this time added the spacy-stanza wrapper.
As shown above, the sentences were not actually tokenized.
Test with spacy-stanza with more processors on Stanza:
It seems that the depparse processor is necessary, but this is rather confusing since the vanilla stanza model does not require it to tokenize.