Ruffle

An extremely fast and accurate Python library for detecting toxic and harmful content in text using state-of-the-art transformer models.

✨ Highlights

⚡ Lightning Fast: Built on PyTorch Lightning for efficient training and inference
🧠 Transformer-Powered: Leverages BERT, DistilBERT, and other state-of-the-art models
🎯 Multi-Label Detection: Simultaneously detects toxic, severe_toxic, obscene, threat, insult, and identity_hate
🚀 Production Ready: Optimized inference pipeline with model compilation
🔧 Flexible Configuration: YAML-based configuration system for easy experimentation
📊 Rich Monitoring: Built-in progress tracking, logging, and model checkpointing

Ruffle provides both high-level prediction APIs for quick content moderation and comprehensive training tools for custom model development.

🚀 Installation

Quick Installation

Install Ruffle using uv (recommended):

uv add ruffle

Or using pip:

pip install ruffle

Development Installation

For training custom models or contributing to development:

# Clone the repository
git clone https://github.com/zuzo-sh/ruffle.git
cd ruffle

# Install with development dependencies using uv
uv sync

# Or using pip
pip install -e ".[dev]"

📖 Usage

API

Get started with toxicity detection in just a few lines:

from ruffle import Ruffle

# Load a pre-trained model
ruffle = Ruffle(model_name="bert-tiny")

# Detect toxicity in text
result = ruffle.predict("This is a sample comment to analyze")
print(result)

# Process multiple texts efficiently  
texts = [
    "This is a normal comment",
    "Another piece of text to check", 
    "Batch processing is supported"
]
results = ruffle.predict(texts)

CLI

Or run Ruffle directly from the command line:

# Classify single text
ruffle --texts "Hello world" --threshold 0.7

# Classify multiple texts  
ruffle --texts '["Text 1", "Text 2"]' --model_name bert-tiny

# Use custom checkpoint
ruffle "Sample text" --checkpoint_path model.ckpt

See the documentation for comprehensive guides and API reference.

📊 Detection Categories

Ruffle detects six categories of toxicity based on the Jigsaw Toxic Comment Classification dataset:

Category	Description
`toxic`	General toxicity and harmful content
`severe_toxic`	Severely toxic content requiring immediate action
`obscene`	Obscene language and explicit content
`threat`	Threatening language and intimidation
`insult`	Insulting and demeaning content
`identity_hate`	Identity-based hate speech and discrimination

Example output:

{
    "This is offensive content": {
        "toxic": 0.89,
        "severe_toxic": 0.23,
        "obscene": 0.67,
        "threat": 0.12,
        "insult": 0.78,
        "identity_hate": 0.34
    }
}

Train your own models

Download the Jigsaw Toxic Comment Classification dataset and unzip the csv files into ./data/jigsaw-toxic-comment-classification-challenge/.
Generate your desired training configuration file:
```
trainer <model_name> --print_config > config.yaml 
```
You must specifiy a model_name that is available on https://huggingface.co/models, e.g. distilbert/distilbert-base-uncased, google-bert/bert-base-uncased.

Run the help command (uv run -m ruffle trainer --help) to get a list of available training options like batch_size, val_size, max_epochs, lr and more.
Run the training script with your configuration file:
```
trainer --config config.yaml
```
Visualize the logging metrics with Tensorboard:
```
tensorboard --logdir /lightning_logs
```

🔬 Development

Prerequisites

Python 3.13+
CUDA-compatible GPU (recommended for training)

Setup Development Environment

# Clone repository
git clone https://github.com/zuzo-sh/ruffle.git
cd ruffle

# Install with development dependencies
uv sync

📚 Documentation

Getting Started Guide - Basic usage and installation
Training Guide - Custom model training and fine-tuning
API Reference - Complete API documentation
Configuration Guide - YAML configuration options
Production Deployment - Performance optimization and scaling

Code Style

This project uses:

Ruff for formatting and linting (configuration in pyproject.toml)
ty for type checking
pytest for testing
Conventional Commits for commit messages

See our Contributing Guide for detailed guidelines.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

🙏 Acknowledgments

Hugging Face Transformers - For the transformer model implementations
PyTorch Lightning - For the training framework and utilities
Jigsaw/Conversation AI - For the toxicity classification dataset

📈 Benchmarks

*Benchmarks run on Macbook Pro M1 Pro

Ruffle - Professional toxicity detection for safer digital spaces.

Name		Name	Last commit message	Last commit date
Latest commit History 255 Commits
.github		.github
configs		configs
data		data
runs		runs
src/ruffle		src/ruffle
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
README.md		README.md
client.py		client.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ruffle

✨ Highlights

🚀 Installation

Quick Installation

Development Installation

📖 Usage

API

CLI

📊 Detection Categories

Train your own models

🔬 Development

Prerequisites

Setup Development Environment

📚 Documentation

Code Style

📄 License

🙏 Acknowledgments

📈 Benchmarks

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

zuzo-sh/ruffle

Folders and files

Latest commit

History

Repository files navigation

Ruffle

✨ Highlights

🚀 Installation

Quick Installation

Development Installation

📖 Usage

API

CLI

📊 Detection Categories

Train your own models

🔬 Development

Prerequisites

Setup Development Environment

📚 Documentation

Code Style

📄 License

🙏 Acknowledgments

📈 Benchmarks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages