The primary goal of this project is to develop a user-friendly desktop application capable of detecting toxic content on social media platforms, particularly Twitter and YouTube.
In this project, we combined trained deep learning models with an interactive interface, allowing users to:
- Manually enter sentences for classification.
- Input Twitter or YouTube links to automatically fetch and analyze the content for toxicity.
With this application, we aim to:
- Automatically filter toxic comments on social platforms.
- Analyze user reactions containing offensive language.
- Perform accurate classifications on Turkish-language datasets using models specifically trained for the local context.
- Python
- PyTorch
- FastText
- BERT (Turkish language models)
- Natural Language Processing (NLP)
- Tkinter (for GUI development)
We used the following publicly available Turkish datasets for training and evaluation:
- Turkish Offensive Language Dataset
The file train.csv contains 42,398, test.csv contains 8,851, valid.csv contains 1,756 annotated tweets. - Turkish Sentiment Analysis Dataset
There are 492.782 labeled sentences. %10 of them were used for testing. Positive|54% Notr|35% Negative|12%
We fine-tuned the following BERT model for Turkish:
Model Name | Model Link | Features |
---|---|---|
FastText Classifier | Drive | Lightweight, fast predictions |
BERT Fine-tuned Model | Drive | High accuracy, contextual analysis |
BERT LSTM Model | Drive | High accuracy, contextual analysis |
Extended Bert LSTM Model | Drive | Transfer learned from BERT LSTM model |
- ✅ Manual text classification
- ✅ Toxicity detection from Twitter or YouTube links
- ✅ Visual feedback on prediction results
- ✅ Turkish language support
- ✅ Easy-to-use desktop interface
-
Clone the repository:
git clone https://github.com/mbahadirk/Offensive-Text-Classification
-
Navigate to the project directory:
cd Offensive-Text-Classification
-
Install the required dependencies:
pip install -r requirements.txt
-
Install the model you want to use with UI. if you didn't install all models it raises alerts but no problem! go to models
-
Install the required tokenizer:
python -c "from transformers import AutoTokenizer; tokenizer = AutoTokenizer.from_pretrained('dbmdz/bert-base-turkish-uncased'); tokenizer.save_pretrained('./models/embeddings/bert-turkish-tokenizer')"
-
Run the application:
python UI_ELEMENTS/main_app.py