Skip to content

CurupiraIA: A Brazilian Portuguese hate speech detection model using BERT fine-tuning, inspired by folklore guardianship principles to protect digital communities from toxic content.

Notifications You must be signed in to change notification settings

codeonthespectrum/Curupira.ia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CurupiraIA: Hate Speech Detection ML Model 🇧🇷🛡️

Python PyTorch Transformers

Status DOI

Table of Contents

Project Description 🔍

Portuguese-language hate speech detection model using BERT architecture, currently in development phase. Key aspects:

  • Fine-tuning neuralmind/bert-base-portuguese-cased
  • Binary classification (hate speech detection)
  • Focus on Brazilian social media text patterns
  • Experimental phase with HateBR dataset

Current Features ✅

  • Data preprocessing pipeline
  • BERT model fine-tuning setup
  • Basic evaluation metrics
  • Hugging Face integration
  • Experiment tracking (W&B)

Installation ⚙️

Prerequisites

  • Python 3.10+
  • CUDA-enabled GPU (recommended)
git clone https://github.com/yourusername/curupira-ml.git
cd curupira-ml
pip install -r requirements.txt

Dataset 📊

HateBR Dataset

from datasets import load_dataset

dataset = load_dataset("ruanchaves/hatebr")

Source: Hugging Face Datasets
7,000 annotated comments
Features:
Text: Raw comment text
Label: Binary classification (0=normal, 1=hate)

Model Architecture 🧠

Base Configuration

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

Fine-Tuning Phase 🎯

Current Status:

[2025] Model in active fine-tuning stage

We're currently optimizing the BERT-large model through:

  • Progressive learning rate decay
  • Class imbalance mitigation techniques
  • Early stopping implementation
  • Validation metric monitoring (F1-score focus)

Training 🔥

Traning Command

trainer.train()

Contributors :octocat:


Kim Gomes

Licença

Copyright ©️ 2025 - Curupira AI

About

CurupiraIA: A Brazilian Portuguese hate speech detection model using BERT fine-tuning, inspired by folklore guardianship principles to protect digital communities from toxic content.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published