Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها
-
Updated
Sep 30, 2024 - C#
Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها
A list of Romanian NLP Datasets
AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/
A meta enriched data set of German parliamental debates covering 74 years of plenary protocols.
Get a pragmatic assessment how understandable a German text is.
Dataset for web-scaled information extraction.
Persian Slang Words (dataset)
This repo is the dataset for the paper "A New Dataset and Methodology for Malicious URL Classification"
Persian sms dataset
Repository for the LREC-COLING 2024 Paper: Persona-Based Corpus in the Diabetes Mellitus Domain – Applying a Human-Centered Approach to a Low-Resource Context
Dataset with annotation of Russian-language poems
Persian News Dataset
Parallel Literary Corpora: Fiction and Poetry Translations
a novel Romanian language dataset for offensive message detection with manually annotated comment from a local Romanian news website (stiri de cluj) into five classes
RO-Offense: A Novel Romanian Dataset for Offensive Language in Online Comments
Add a description, image, and links to the nlp-dataset topic page so that developers can more easily learn about it.
To associate your repository with the nlp-dataset topic, visit your repo's landing page and select "manage topics."