-
Notifications
You must be signed in to change notification settings - Fork 62
Open
Description
Hello,
It takes too long to parse the doc object, i.e to iterate over sentence and tokens in them. Is that expected ?
snlp = stanfordnlp.Pipeline(processors='tokenize,pos', models_dir=model_dir)
nlp = StanfordNLPLanguage(snlp)
for line in lines:
doc = nlp.pipe([line])
The above code takes few milliseconds (apart from initialisation) to run over 500 sentences,
snlp = stanfordnlp.Pipeline(processors='tokenize,pos', models_dir=model_dir)
nlp = StanfordNLPLanguage(snlp)
for line in lines:
doc = nlp.pipe([line])
token_details = []
for sents in doc:
for tok in sents:
token_details.append([tok.text, tok.lemma_, tok.pos_])
while this takes almost a minute(apart from initialisation) to run over 500 sentences
P.S : Have put nlp.pipe() inside a for loop intentionally to get all tokens for one sentence even though it gets segmented.
Metadata
Metadata
Assignees
Labels
No labels