Skip to content

microsoft/RTP-LX

RTP-LX

Dataset for the paper RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?, by De Wynter et al.

NOTE: This repo is actively updated!

WARNING: This repository contains and discusses content that is offensive or upsetting. All materials are intended to support research that improves toxicity detection methods. Included examples of toxicity do not represent how the authors or sponsors feel about any identity groups. This corpus was made by a multi-national, multi-cultural team of various faiths, beliefs, and origins. Please note that toxicity is dynamic, evolves with societal perceptions, and these labels may change.

What is RTP-LX?

RTP LX is a multilingual set of 1k+ (per locale) toxic prompts and passages designed for toxicity evaluation. It is manually translated from a subset of the original RTP dataset, and annotated by native speakers. It also includes:

  • Coverage in 38 languages (but more files because graphically-distinct dialects like ZH-Hans vs. ZH-Hant and DE-de vs. DE-ch are treated separately)
  • Manually designed prompts that are considered "hard" to translate to English, and could be considered offensive in the language's geolocale.
  • Translations may include dialect-specific indications (e.g., Levantine Arabic, Brazilian Portuguese)

Languages covered

RTP-LX currently covers 38 languages:

  • Arabic (Egyptian, Levantine, Saudi)
  • BCMS
  • Bulgarian*
  • Catalan*
  • Chinese (standard, simplified and standard, traditional)
  • Czech
  • Danish
  • Dutch
  • English
  • Estonian*
  • Finnish
  • French (France)
  • German (standard, Germany and standard, Switzerland*)
  • Greek
  • Hebrew
  • Hindi
  • Hungarian
  • Indonesian
  • Italian
  • Japanese
  • Korean
  • Latvian*
  • Lithuanian*
  • Norwegian (Bokmål)
  • Polish
  • Portuguese (Brazilian, Portuguese)
  • Romanian*
  • Russian (Russia, Ukraine)
  • Slovak*
  • Slovenian*
  • Spanish (Spain)
  • Swahili
  • Swedish
  • Thai
  • Turkish
  • Ukrainian
  • Vietnamese*
  • Welsh*

See Structure, below, for the languages marked with an asterisk (*).

Harm Categories

RTP-LX is annotated in the following categories:

  • Bias
  • Insult
  • Identity Attack
  • Microagression
  • Violence
  • Self-harm
  • Sexual content
  • Overall toxicity

Structure

RTP-LX has two main components: prompts (human transcreated, human annotated), and completions (synthetically generated, human annotated).

  • Prompts are meant to measure the effectiveness of your guardrails in multilingual scenarios, as well as automated annotation capabilities.
  • Completions, on the other hand, are much, much more toxic and are designed for ablation analysis of harm categories.
  • BenignCompletions are human-written completions -- perfect for DPO!
  • PromptAnnotations and CompletionsAnnotations contain the aggregated (majority vote) scores for the users.
  • The languages marked with an asterisk (*) do not contain Completions or the culturally-specific prompts (budgetary reasons)

Uncompressing

To avoid crawlers, we have zipped and password-protected the entries. Please use the name of the repo all in lowercase plus "-entries" and -4/8/24 as the password. So if the repo is "ASDF-BLAH", you want asdf-blah-entries-4/8/24.

Updates:

  • (December '24): The paper for RTP-LX got accepted to AAAI! We will post the CR version soon.
  • (August '24): V1.5 released! Added 11 new languages: BG, CA, ET, HE, LV, LT, RO, SK, SL, VI, CY and one dialect (DE-CH)
  • (May '24): Benign set released, scoring updated to what we described in the paper.
  • (Apr '24): Paper released!
  • (Mar '24): V1.0 released! Passages annotated. This is the first full release of RTP-LX. We do have updates coming, so stay tuned.
  • (Jan '24): V0.3 released! Added SW/BCMS. Compressed to file. Passages to come soon.
  • (Dec '23): V0.2 released! Added 19 more languages, and included PT (pt) prompts. Note that BCMS/Swahili are projected for a later date.
  • (Sep '23): V0.1 released! Prompts for ES, FR, DE, IT, JA, PT (br), ZH (simplified), AR and CS.

Citation

If you use our work, please consider citing our paper. Proper Bibtex is here, but this one is fixed to be less unwieldy:

@article{rtplx,
    title={RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?},
    volume={39},
    url={https://ojs.aaai.org/index.php/AAAI/article/view/35011},
    DOI={10.1609/aaai.v39i27.35011},
    number={27},
    journal={Proceedings of the AAAI Conference on Artificial Intelligence},
    author={de Wynter, Adrian and Watts, Ishaan and Wongsangaroonsri, Tua and Zhang, Minghui and Farra, Noura and Altıntoprak, Nektar Ege and Baur, Lena and Claudet, Samantha and Gajdušek, Pavel and Gu, Qilong and Kaminska, Anna and Kaminski, Tomasz and Kuo, Ruby and Kyuba, Akiko and Lee, Jongho and Mathur, Kartik and Merok, Petter and Milovanović, Ivana and Paananen, Nani and Paananen, Vesa-Matti and Pavlenko, Anna and Vidal, Bruno Pereira and Strika, Luciano Ivan and Tsao, Yueh and Turcato, Davide and Vakhno, Oleksandr and Velcsov, Judit and Vickers, Anna and Visser, Stéphanie F. and Widarmanto, Herdyan and Zaikin, Andrey and Chen, Si-Qing},
    year={2025},
    month={Apr.},
    pages={27940-27950}
}

along with the original RTP paper:

@inproceedings{gehman-etal-2020-realtoxicityprompts,
    title = "{R}eal{T}oxicity{P}rompts: Evaluating Neural Toxic Degeneration in Language Models",
    author = "Gehman, Samuel  and
      Gururangan, Suchin  and
      Sap, Maarten  and
      Choi, Yejin  and
      Smith, Noah A.",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.301",
    doi = "10.18653/v1/2020.findings-emnlp.301",
    pages = "3356--3369",
}

Some components for Hebrew, Danish, Korean and Brazilian Portuguese come from the Offensive Hebrew Corpus, DKHate, BEEP! and ToLD-BR corpora, respectively. Please consider citing their work as well:

@inproceedings{
  title = {Offensive {H}ebrew Corpus and Detection using {BERT}},
  author = {Nagham Hamad and Mustafa Jarrar and Mohammed Khalilia and Nadim Nashif},
  booktitle = {The 20th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA)},
  year = {2023},
  publisher = {IEEE},
  address = {Egypt}
}

@inproceedings{sigurbergsson-derczynski-2020-offensive,
    title = "Offensive Language and Hate Speech Detection for {D}anish",
    author = "Sigurbergsson, Gudbjartur Ingi  and
      Derczynski, Leon",
    booktitle = "Proceedings of the Twelfth Language Resources and Evaluation Conference",
    month = may,
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2020.lrec-1.430",
    pages = "3498--3508",
    language = "English",
    ISBN = "979-10-95546-34-4",
}

@inproceedings{moon-etal-2020-beep,
    title = "{BEEP}! {K}orean Corpus of Online News Comments for Toxic Speech Detection",
    author = "Moon, Jihyung  and
      Cho, Won Ik  and
      Lee, Junbum",
    booktitle = "Proceedings of the Eighth International Workshop on Natural Language Processing for Social Media",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.socialnlp-1.4",
    pages = "25--31",
}

@inproceedings{ToLDBR,
  author = {Jo\~{a}o A. Leite and Diego F. Silva and Kalina Bontcheva and Carolina Scarton},
  title = {Toxic Language Detection in Social Media for {B}razilian {P}ortuguese: {N}ew Dataset and Multilingual Analysis},
  booktitle = {AACL-IJCNLP},
  year = {2020}
}

Contributing

See here.

About

Repository for the paper "RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?"

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •