Stanza's sentencizer only works when `processors = 'tokenize,pos,lemma,depparse'`

Hi all,

I started an NLP project where I needed high accuracy sentence segmentation, and therefore decided to use stanza.

I was thrilled to find this library, since Spacy is quite intuitive. However, I found that the sentence segmentation only gets carried into spacy under certain conditions.

**Baseline:**

The baseline text is to use the Stanza model alone to see if the sentence segmentation works.

This is the simplest model that I could use, I simply turned on the `tokenize` processor.

<img width="559" alt="Screenshot 2021-02-03 at 18 57 31" src="https://user-images.githubusercontent.com/64047828/106788943-e8db4380-6651-11eb-9421-33caa2997e96.png">

**Test with Spacy-Stanza:**

I then tried the same thing, but this time added the spacy-stanza wrapper.

<img width="525" alt="Screenshot 2021-02-03 at 18 58 00" src="https://user-images.githubusercontent.com/64047828/106788941-e842ad00-6651-11eb-82f8-fb033c2bb138.png">

As shown above, the sentences were not actually tokenized. 

**Test with spacy-stanza with more processors on Stanza:**

<img width="571" alt="Screenshot 2021-02-03 at 18 56 23" src="https://user-images.githubusercontent.com/64047828/106788950-ea0c7080-6651-11eb-812a-469efdb0e632.png">

It seems that the `depparse` processor is necessary, but this is rather confusing since the vanilla stanza model does not require it to tokenize.

Any help would be appreciated :)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stanza's sentencizer only works when `processors = 'tokenize,pos,lemma,depparse'` #57

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stanza's sentencizer only works when processors = 'tokenize,pos,lemma,depparse' #57

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Stanza's sentencizer only works when `processors = 'tokenize,pos,lemma,depparse'` #57