This project implements a hybrid deep learning model for detecting hate speech, offensive language, and neutral text in tweets. It combines BERT embeddings with CNN and LSTM layers to achieve high accuracy in text classification. Hosted on: Streamlit Cloud
- Classifies tweets/text into:
- Hate Speech
- Offensive Language
- Neutral
- Hybrid model architecture:
- BERT for embeddings
- CNN for feature extraction
- LSTM for sequence learning
- Streamlit Web App for real-time predictions
- Color-coded probability bars for better visualization
HATE_SPEECH_DETECTION-LOCAL/
├── app/
│ ├── pycache/
│ ├── .env
│ ├── .venv/
│ ├── cnn_lstm_hate_speech_bert/
│ ├── hsd_model.py
│ ├── labeled_data.csv
│ ├── requirements.txt
│ └── streamlit_app.py
│
├── assets/
│ ├── hate_speech.png
│ ├── neutral.png
│ └── offensive.png
│
├── docs/
│ ├── certificate.jpeg
│ ├── Code run.mkv
│ ├── Deep Learning for Hate Speech Detection in Tweets Using LSTM ...
│ ├── final_ppt_hate_speech.pptx
│ ├── hate_speech_report_final_2023.pdf
│ └── hate_speech_report_final_2323.docx
│
├── Notebook/
│ ├── bert_lstm_cnn_test.ipynb
│ └── bert_lstm_cnn.ipynb
└── README.md
cnn_lstm_hate_speech_bert/
) is too large for GitHub (>100MB). It is excluded via .gitignore
. You must download or export your own SavedModel.
-
Clone the repo:
git clone https://github.com/ArponRoy7/Hate_speech_detection.git cd Hate_speech_detection
-
Create & activate a virtual environment:
python -m venv .venv source .venv/bin/activate # Linux/Mac .venv\Scripts\activate # Windows
-
Install dependencies:
pip install -r requirements.txt
Ensure TensorFlow and TensorFlow-Text match:
pip install "tensorflow==2.19.0" "tensorflow-text==2.19.0"
-
Place your trained model inside the project root:
cnn_lstm_hate_speech_bert/ ├── saved_model.pb └── variables/
Start the Streamlit app:
streamlit run streamlit_app.py
- Input: "Who tf is this faggot"
→ Hate Speech 🟨 - Input: "One fly ass bitch"
→ Offensive 🟥 - Input: "Harry Styles is so hot"
→ Neutral 🟩
- Davidson et al. — Hate Speech and Offensive Language
- Basile et al. — Hate and Not Hate
- Zampieri et al. — Offensive and Not Offensive
Final merged dataset: 34,896 tweets
Labels: 0=Hate Speech, 1=Offensive, 2=Neutral
- Training & validation accuracy plotted over epochs
- Confusion matrix on test data
- Hybrid BERT + CNN + LSTM shows superior performance compared to standalone CNN, LSTM, Bi-LSTM
- Mrs. Monika Singh, Assistant Professor, RCCIIT
- Ashif Mondal
- Diya Neogi
- Arpon Roy
- Bidisha Saha