How to use multiple regex patterns with Normalizer spark NLP? #2599
Unanswered
SameekshaS
asked this question in
Q&A
Replies: 1 comment 4 replies
-
Spark NLP Tokenizer has minLength and maxLength parameters, you can set the minLength and filter those less than a certain length. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am working with pyspark dataframe. I need to perform tf-idf and for that I am used prior steps of tokenizing, normalization, etc using spark NLP.
I have df that looks like this after applying tokenizer:
The next step is to apply normalizer:
I want to set multiple clean up patterns:
so far
cleanup = ["[^A-Za-z]"]
fulfils the first condition but I don't understand how to use the second one.I tried this:
Help would be much appreciated !
Beta Was this translation helpful? Give feedback.
All reactions