Offensive Text Classification

Go to Sections

Project Overview
Datasets
Pretrained Models
Developed Models

Project Overview

The primary goal of this project is to develop a user-friendly desktop application capable of detecting toxic content on social media platforms, particularly Twitter and YouTube.

In this project, we combined trained deep learning models with an interactive interface, allowing users to:

Manually enter sentences for classification.
Input Twitter or YouTube links to automatically fetch and analyze the content for toxicity.

Project Objectives

With this application, we aim to:

Automatically filter toxic comments on social platforms.
Analyze user reactions containing offensive language.
Perform accurate classifications on Turkish-language datasets using models specifically trained for the local context.

Technologies Used

Python
PyTorch
FastText
BERT (Turkish language models)
Natural Language Processing (NLP)
Tkinter (for GUI development)

Datasets

We used the following publicly available Turkish datasets for training and evaluation:

Turkish Offensive Language Dataset
The file train.csv contains 42,398, test.csv contains 8,851, valid.csv contains 1,756 annotated tweets.
Turkish Sentiment Analysis Dataset
There are 492.782 labeled sentences. %10 of them were used for testing. Positive|54% Notr|35% Negative|12%

Pretrained Models

We fine-tuned the following BERT model for Turkish:

dbmdz/bert-base-turkish-uncased

Developed Models

Model Name	Model Link	Features
FastText Classifier	Drive	Lightweight, fast predictions
BERT Fine-tuned Model	Drive	High accuracy, contextual analysis
BERT LSTM Model	Drive	High accuracy, contextual analysis
Extended Bert LSTM Model	Drive	Transfer learned from BERT LSTM model

Application Features

✅ Manual text classification
✅ Toxicity detection from Twitter or YouTube links
✅ Visual feedback on prediction results
✅ Turkish language support
✅ Easy-to-use desktop interface

Model Performance

Accuracy Comparison

Confusion Matrix

How to Run the Project

Clone the repository:

git clone https://github.com/mbahadirk/Offensive-Text-Classification

Navigate to the project directory:
```
cd Offensive-Text-Classification
```
Install the required dependencies:
```
pip install -r requirements.txt
```
Install the model you want to use with UI. if you didn't install all models it raises alerts but no problem! go to models

Install the required tokenizer:

python -c "from transformers import AutoTokenizer; tokenizer = AutoTokenizer.from_pretrained('dbmdz/bert-base-turkish-uncased'); tokenizer.save_pretrained('./models/embeddings/bert-turkish-tokenizer')"

Run the application:
```
python UI_ELEMENTS/main_app.py
```

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
FastText_LSTM_UI		FastText_LSTM_UI
MLOPS		MLOPS
NOTEBOOKS		NOTEBOOKS
UI_ELEMENTS		UI_ELEMENTS
datasets		datasets
images		images
models		models
old-codes		old-codes
output		output
results		results
LICENSE		LICENSE
README.md		README.md
preprocess_derin_raw.py		preprocess_derin_raw.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Offensive Text Classification

Go to Sections

Project Overview

Project Objectives

Technologies Used

Datasets

Pretrained Models

Developed Models

Application Features

Model Performance

Accuracy Comparison

Confusion Matrix

How to Run the Project

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

mbahadirk/Offensive-Text-Classification

Folders and files

Latest commit

History

Repository files navigation

Offensive Text Classification

Go to Sections

Project Overview

Project Objectives

Technologies Used

Datasets

Pretrained Models

Developed Models

Application Features

Model Performance

Accuracy Comparison

Confusion Matrix

How to Run the Project

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages