spark nlp ner XlmRoBertaForTokenClassification performance improvement #13475
LucaPifferettiPrivate
started this conversation in
General
Replies: 1 comment 8 replies
-
Hi, So I would go like this:
This Webinar is about the exact same thing: https://www.johnsnowlabs.com/watch-webinar-speed-optimization-benchmarks-in-spark-nlp-3-making-the-most-of-modern-hardware/ |
Beta Was this translation helpful? Give feedback.
8 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone!
I'm using a NER model XlmRoBertaForTokenClassification to find person name inside a column of messages.
The problem is the model is really slow and it takes 35 minutes to process 100K messages.
I have this configuration:
spark driver cores = 2
spark driver memory = 48Gb
spark executors = 8
spark executors cores = 8
spark executores memory = 32Gb
Given a look to the spark UI I have found that during a stage involving the ner model I have a single task that takes 30 minutes, so to improve performance I would need to use all the executors, but it seems a problem related to the model.
Did anyone have the same problem?
Beta Was this translation helpful? Give feedback.
All reactions