Skip to content

mbahadirk/Offensive-Text-Classification

Repository files navigation

Offensive Text Classification

Go to Sections

Project Overview

The primary goal of this project is to develop a user-friendly desktop application capable of detecting toxic content on social media platforms, particularly Twitter and YouTube.

In this project, we combined trained deep learning models with an interactive interface, allowing users to:

  • Manually enter sentences for classification.
  • Input Twitter or YouTube links to automatically fetch and analyze the content for toxicity.

Project Objectives

With this application, we aim to:

  • Automatically filter toxic comments on social platforms.
  • Analyze user reactions containing offensive language.
  • Perform accurate classifications on Turkish-language datasets using models specifically trained for the local context.

Technologies Used

  • Python
  • PyTorch
  • FastText
  • BERT (Turkish language models)
  • Natural Language Processing (NLP)
  • Tkinter (for GUI development)

Datasets

We used the following publicly available Turkish datasets for training and evaluation:


Pretrained Models

We fine-tuned the following BERT model for Turkish:


Developed Models

Model Name Model Link Features
FastText Classifier Drive Lightweight, fast predictions
BERT Fine-tuned Model Drive High accuracy, contextual analysis
BERT LSTM Model Drive High accuracy, contextual analysis
Extended Bert LSTM Model Drive Transfer learned from BERT LSTM model

Application Features

  • ✅ Manual text classification
  • ✅ Toxicity detection from Twitter or YouTube links
  • ✅ Visual feedback on prediction results
  • ✅ Turkish language support
  • ✅ Easy-to-use desktop interface

Model Performance

Accuracy Comparison

Model Comparisons DNN.png

Confusion Matrix

confussion matrix comparison.png


How to Run the Project

  1. Clone the repository:

    git clone https://github.com/mbahadirk/Offensive-Text-Classification
  2. Navigate to the project directory:

    cd Offensive-Text-Classification
  3. Install the required dependencies:

    pip install -r requirements.txt
  4. Install the model you want to use with UI. if you didn't install all models it raises alerts but no problem! go to models

  5. Install the required tokenizer:

    python -c "from transformers import AutoTokenizer; tokenizer = AutoTokenizer.from_pretrained('dbmdz/bert-base-turkish-uncased'); tokenizer.save_pretrained('./models/embeddings/bert-turkish-tokenizer')"
    
  6. Run the application:

    python UI_ELEMENTS/main_app.py

About

Turkish Toxic Comment Classification

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •