🎯 YouTube Toxic Comment Classification Using BERT

🚀 Overview

This project is designed to classify YouTube comments as toxic or non-toxic using BERT (Bidirectional Encoder Representations from Transformers). By fine-tuning a pre-trained BERT model, we leverage state-of-the-art NLP capabilities to identify harmful content in online conversations.

The model is trained to identify toxicity, which is crucial for creating safer and more respectful online platforms. This project can be extended for different toxicity levels or used in moderation tools.

🔎 Demo

Here's a quick demo of the toxicity classification tool:

📊 Dataset

The dataset consists of YouTube comments with corresponding labels indicating whether the comment is toxic (1) or non-toxic (0). This binary classification problem is aimed at improving content moderation in online discussions.

Columns:
- comment_id: Unique identifier for each comment.
- content: Text of the comment.
- label: 1 for toxic comments, 0 for non-toxic comments.

💻 Installation

To get started with this project, follow these instructions:

Prerequisites

Python 3.7+
PyTorch 1.6+
Hugging Face Transformers library
CUDA-enabled GPU (optional but recommended)

Setup

Clone the Repository:

git clone https://github.com/your-repository/youtube-toxic-comment-classification.git
cd youtube-toxic-comment-classification

🛠️ Data Preprocessing

The raw comments are preprocessed before being fed into the BERT model. This includes:

Removing URLs, special characters, and extra spaces.
Converting all text to lowercase.
Tokenizing using BERT tokenizer, which converts the text into input tokens compatible with BERT.

🧠 Model Architecture:

BERT (Bidirectional Encoder Representations from Transformers) This project uses BERT to handle the NLP task of toxic comment detection. BERT is a transformer-based model that understands the context of words in sentences, making it highly effective for text classification tasks.

Key Components:

Tokenizer: Converts sentences into token IDs.

BERT for Sequence Classification: Pre-trained BERT model fine-tuned for binary classification.

📊 Results:

After fine-tuning BERT, we achieved the following performance metrics:

1.) Accuracy: ~90%

2.) F1 Score: ~0.85

3.) Precision: ~0.88

4.) Recall: ~0.83

📈 Future Improvements

1.) Multi-label Classification: Extend the model to classify different types of toxicity (e.g., hate speech, threats, etc.).

2.) Data Augmentation: Generate synthetic examples to address class imbalance.

3.) Model Optimization: Experiment with models like DistilBERT for faster performance.

4.) Multi-language Support: Expand to detect toxicity in different languages.

📜 License:

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Readme.md		Readme.md
Youttube_Text_claasisifacrtion.ipynb		Youttube_Text_claasisifacrtion.ipynb
youtoxic_english_1000.csv		youtoxic_english_1000.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎯 YouTube Toxic Comment Classification Using BERT

🚀 Overview

🔎 Demo

🗂️ Table of Contents

📊 Dataset

💻 Installation

Prerequisites

Setup

🛠️ Data Preprocessing

🧠 Model Architecture:

Key Components:

📊 Results:

📈 Future Improvements

📜 License:

About

Uh oh!

Releases

Packages

Languages

Blacksujit/Youtube-Analysis

Folders and files

Latest commit

History

Repository files navigation

🎯 YouTube Toxic Comment Classification Using BERT

🚀 Overview

🔎 Demo

🗂️ Table of Contents

📊 Dataset

💻 Installation

Prerequisites

Setup

🛠️ Data Preprocessing

🧠 Model Architecture:

Key Components:

📊 Results:

📈 Future Improvements

📜 License:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages