GitHub - uai-ufmg/tolde-br

ToLDE-Br is a dataset derived from the ToLD-Br database. In this version, 1000 sentences were filtered, comprising:

500 sentences classified as toxic by more than two evaluators.
500 sentences considered non-toxic.

In the set of toxic texts, a labeling process was carried out to identify which specific words (spans) contributed to their classification as toxic.

Annotators: The database was labeled by 11 volunteers.
Consistency: Each comment was examined by three different people.
Instructions: The annotators were informed that the texts contained toxicity, but they did not know which of the toxicity subclasses the text had previously received. The instruction given was to identify the words that made the text perceived as offensive and containing hate speech.

The annotation process was inspired by the work of HateXplain.

The ToLDE-Br dataset was developed as part of the master's thesis "O que torna uma frase tóxica? Uma análise crítica de modelos especialistas em detecção de toxicidade" authored by Gabriel Melo and Flávio Figueiredo. The thesis was defended in March 2025 at the Programa de Pós-Graduação em Ciência da Computação (PPGCC) of the Universidade Federal de Minas Gerais (UFMG)```

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dataset		dataset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

License

uai-ufmg/tolde-br

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages