- Project Description
- Current Features
- Installation
- Dataset
- Model Architecture
- Training
- Contributors
- License
Portuguese-language hate speech detection model using BERT architecture, currently in development phase. Key aspects:
- Fine-tuning neuralmind/bert-base-portuguese-cased
- Binary classification (hate speech detection)
- Focus on Brazilian social media text patterns
- Experimental phase with HateBR dataset
- Data preprocessing pipeline
- BERT model fine-tuning setup
- Basic evaluation metrics
- Hugging Face integration
- Experiment tracking (W&B)
- Python 3.10+
- CUDA-enabled GPU (recommended)
git clone https://github.com/yourusername/curupira-ml.git
cd curupira-ml
pip install -r requirements.txt
from datasets import load_dataset
dataset = load_dataset("ruanchaves/hatebr")
Source: Hugging Face Datasets
7,000 annotated comments
Features:
Text: Raw comment text
Label: Binary classification (0=normal, 1=hate)
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
Current Status:
[2025] Model in active fine-tuning stage
- Progressive learning rate decay
- Class imbalance mitigation techniques
- Early stopping implementation
- Validation metric monitoring (F1-score focus)
trainer.train()
Kim Gomes |
---|
Copyright ©️ 2025 - Curupira AI