Spark NLP 5.2.3: ONNX support for XLM-RoBERTa Token and Sequence Classifications, and Question Answering task, AWS SDK optimizations, New notebooks, Over 400 new state-of-the-art Transformer Models in ONNX, and bug fixes! #14142
maziyarpanahi
announced in
Announcement
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
📢 Overview
Spark NLP 5.2.3 🚀 comes with an array of exciting features and optimizations. We're thrilled to announce support for ONNX Runtime in
XLMRoBertaForTokenClassification
,XLMRoBertaForSequenceClassification
, andXLMRoBertaForQuestionAnswering
annotators. This release also showcases a significant refinement in the use of AWS SDK in Spark NLP, shifting fromaws-java-sdk-bundle
toaws-java-sdk-s3
, resulting in a substantial ~320MB reduction in library size and a 20% increase in startup speed, new notebooks to import external models from Hugging Face, over 400+ new LLM models, and more!We're pleased to announce that our Models Hub now boasts 36,000+ free and truly open-source models & pipelines 🎉. Our deepest gratitude goes out to our community for their invaluable feedback, feature suggestions, and contributions.
🔥 New Features & Enhancements
XLMRoBertaForTokenClassification
annotatorXLMRoBertaForSequenceClassification
annotatorXLMRoBertaForQuestionAnswering
annotatoraws-java-sdk-bundle
to theaws-java-sdk-s3
dependency. This change has resulted in a 318MB reduction in the library's overall size and has enhanced the Spark NLP startup time by 20%. For instance, usingsparknlp.start()
in Google Colab is now 14 to 20 seconds faster. Special thanks to @c3-avidmych for requesting this feature.DeBertaForQuestionAnswering
,DebertaForSequenceClassification
, andDeBertaForTokenClassification
models from HuggingFaceDocumentTokenSplitter
notebookINSTRUCTOR
EmbeddingsRoBertaForTokenClassification
notebookRoBertaForSequenceClassification
notebookOpenAICompletion
notebook with newgpt-3.5-turbo-instruct
model🐛 Bug Fixes
BGEEmbeddings
not downloading in Pythonℹ️ Known Issues
T4 GPU
runtime ONNX models crash when they are used in Colab's T4 GPU runtime #14109📓 New Notebooks
📖 Documentation
❤️ Community support
Installation
Python
#PyPI pip install spark-nlp==5.2.3
Spark Packages
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x: (Scala 2.12):
GPU
Apple Silicon (M1 & M2)
AArch64
Maven
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x:
spark-nlp-gpu:
spark-nlp-silicon:
spark-nlp-aarch64:
FAT JARs
CPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x/3.5.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-5.2.3.jar
GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x/3.5.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-5.2.3.jar
M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x/3.5.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-silicon-assembly-5.2.3.jar
AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x/3.5.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-5.2.3.jar
What's Changed
New Contributors
Full Changelog: 5.2.2...5.2.3
Beta Was this translation helpful? Give feedback.
All reactions