🚫 NSFW Text Detection using BigBird + Hugging Face Deployment

This repository contains a complete pipeline for fine-tuning Google's BigBird transformer for NSFW (Not Safe For Work) text classification and deploying it as a live API on Hugging Face Spaces using Gradio. The solution is optimized for identifying inappropriate textual content in long-form documents — useful for child protection systems, content moderation, and safe browsing applications.

📌 Features

✅ Fine-tuned google/bigbird-roberta-base model for multi-label NSFW text classification.
✅ Supports long text inputs (up to 4096 tokens).
✅ Lightweight inference API with Gradio.
✅ Hosted on Hugging Face Spaces for public access.
✅ Google Colab training notebook included.
✅ Easily customizable for other text classification tasks.

🧠 Model Overview

Architecture: BigBird-RoBERTa-base
Task: Binary classification (Safe = 0, NSFW = 1)
Input: Raw text paragraphs
Output: 0 (Safe), 1 (Not Safe)

The model is trained on a labeled dataset (NSFW1.csv) consisting of text segments with corresponding binary labels indicating safety.

🛠️ Installation

To run the API locally:

git clone https://github.com/rusiru-erandaka/Bigbird_huggingface_deploy.git
cd Bigbird_huggingface_deploy
pip install -r requirements.txt
python app.py

📁 Project Structure

Bigbird_huggingface_deploy/

Bigbird2_.ipynb # Training notebook (Google Colab)
app.py # Gradio API app
onfig.json # Model config
tokenizer_config.json # Tokenizer config
pecial_tokens_map.json # Tokenizer special tokens
spiece.model # SentencePiece model
requirements.txt # Python dependencies
LICENSE

🚀 Deployment on Hugging Face

Upload your fine-tuned model and tokenizer files to the Hugging Face Model Hub.
Deploy app.py on Hugging Face Spaces using Gradio as the interface.
Ensure that the model_id in app.py matches your model repository name, e.g., "Rerandaka/Cild_safety_bigbird".

🔍 Example Usage

from transformers import pipeline
classifier = pipeline("text-classification", model="Rerandaka/Cild_safety_bigbird")
classifier("This is an inappropriate message involving violence.")
# Output: [{'label': 'LABEL_1', 'score': 0.98}]

📊 Training Summary

Optimizer: AdamW
Epochs: 3–5
Evaluation Metrics: Accuracy, Precision, Recall
Dataset Split: 80% Train / 20% Test
Platform: Google Colab with GPU

📎 Dependencies

transformers
torch
gradio

📜 License

This repository is licensed under the MIT License.

🙋‍♂️ Acknowledgements

Hugging Face 🤗 for Transformers and Spaces
Google for BigBird-RoBERTa
Gradio for interactive UI
Rusiru Erandaka for fine-tuning and deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚫 NSFW Text Detection using BigBird + Hugging Face Deployment

📌 Features

🧠 Model Overview

🛠️ Installation

📁 Project Structure

🚀 Deployment on Hugging Face

🔍 Example Usage

📊 Training Summary

📎 Dependencies

📜 License

🙋‍♂️ Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Bigbird2_.ipynb		Bigbird2_.ipynb
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config.json		config.json
requirements.txt		requirements.txt
special_tokens_map.json		special_tokens_map.json
spiece.model		spiece.model
tokenizer_config.json		tokenizer_config.json

License

rusiru-erandaka/Bigbird_huggingface_deploy

Folders and files

Latest commit

History

Repository files navigation

🚫 NSFW Text Detection using BigBird + Hugging Face Deployment

📌 Features

🧠 Model Overview

🛠️ Installation

📁 Project Structure

🚀 Deployment on Hugging Face

🔍 Example Usage

📊 Training Summary

📎 Dependencies

📜 License

🙋‍♂️ Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages