Skip to content

This repository contains a complete pipeline for fine-tuning Google's BigBird transformer for NSFW (Not Safe For Work) text classification and deploying it as a live API on Hugging Face Spaces using Gradio.

License

Notifications You must be signed in to change notification settings

rusiru-erandaka/Bigbird_huggingface_deploy

Repository files navigation

🚫 NSFW Text Detection using BigBird + Hugging Face Deployment

This repository contains a complete pipeline for fine-tuning Google's BigBird transformer for NSFW (Not Safe For Work) text classification and deploying it as a live API on Hugging Face Spaces using Gradio. The solution is optimized for identifying inappropriate textual content in long-form documents — useful for child protection systems, content moderation, and safe browsing applications.


📌 Features

  • ✅ Fine-tuned google/bigbird-roberta-base model for multi-label NSFW text classification.
  • ✅ Supports long text inputs (up to 4096 tokens).
  • ✅ Lightweight inference API with Gradio.
  • ✅ Hosted on Hugging Face Spaces for public access.
  • ✅ Google Colab training notebook included.
  • ✅ Easily customizable for other text classification tasks.

🧠 Model Overview

  • Architecture: BigBird-RoBERTa-base
  • Task: Binary classification (Safe = 0, NSFW = 1)
  • Input: Raw text paragraphs
  • Output: 0 (Safe), 1 (Not Safe)

The model is trained on a labeled dataset (NSFW1.csv) consisting of text segments with corresponding binary labels indicating safety.


🛠️ Installation

To run the API locally:

git clone https://github.com/rusiru-erandaka/Bigbird_huggingface_deploy.git
cd Bigbird_huggingface_deploy
pip install -r requirements.txt
python app.py

📁 Project Structure

Bigbird_huggingface_deploy/

  • Bigbird2_.ipynb # Training notebook (Google Colab)
  • app.py # Gradio API app
  • onfig.json # Model config
  • tokenizer_config.json # Tokenizer config
  • pecial_tokens_map.json # Tokenizer special tokens
  • spiece.model # SentencePiece model
  • requirements.txt # Python dependencies
  • LICENSE

🚀 Deployment on Hugging Face

  • Upload your fine-tuned model and tokenizer files to the Hugging Face Model Hub.
  • Deploy app.py on Hugging Face Spaces using Gradio as the interface.
  • Ensure that the model_id in app.py matches your model repository name, e.g., "Rerandaka/Cild_safety_bigbird".

🔍 Example Usage

from transformers import pipeline
classifier = pipeline("text-classification", model="Rerandaka/Cild_safety_bigbird")
classifier("This is an inappropriate message involving violence.")
# Output: [{'label': 'LABEL_1', 'score': 0.98}]

📊 Training Summary

  • Optimizer: AdamW
  • Epochs: 3–5
  • Evaluation Metrics: Accuracy, Precision, Recall
  • Dataset Split: 80% Train / 20% Test
  • Platform: Google Colab with GPU

📎 Dependencies

transformers
torch
gradio

📜 License

This repository is licensed under the MIT License.


🙋‍♂️ Acknowledgements

  • Hugging Face 🤗 for Transformers and Spaces
  • Google for BigBird-RoBERTa
  • Gradio for interactive UI
  • Rusiru Erandaka for fine-tuning and deployment

About

This repository contains a complete pipeline for fine-tuning Google's BigBird transformer for NSFW (Not Safe For Work) text classification and deploying it as a live API on Hugging Face Spaces using Gradio.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published