Why are tokens with underscore or hyphen ignored in YakeKeywordExtraction() annotator? #9022
Unanswered
a-kliuieva
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a Spark dataframe
input_df
:I want to extract keywords for each
id
usingYakeKeywordExtraction()
annotator.For this I use the following pipeline:
Results obtained:
It is obvious that predominant tokens
solar_system
andmilky_way
are ignored (a similar situation if a hyphen or space is used instead of an underscore).. But why and how to deal with this?Thanks a lot for any advice!
Beta Was this translation helpful? Give feedback.
All reactions