GitHub - pointer2Alvee/paper-hbert-sarcasm-detection: Implementation of the paper "A Novel Hierarchical BERT Architecture for Sarcasm Detection" in text using Python & TensorFlow

📜 paper-hbert-sarcasm-detection

📌 Summary

Implementation of the paper : "A Novel Hierarchical BERT Architecture for Sarcasm Detection" using Python, Pandas, NumPy, Kaggle and TensorFlow.

🧠 Overview

This project implements a Hierarchical BERT (H-BERT) architecture for sarcasm detection in text, inspired by the paper "A Novel Hierarchical BERT Architecture for Sarcasm Detection". This model is a combination of LSTM & CNN with the BERT embeddings. Unlike traditional sentiment analysis tasks, sarcasm detection often requires understanding multi-sentence context and subtle cues. This hierarchical model addresses that by processing text at both the sentence and context level using a 5 layered architecture that includes (1) sentence encoding, (2) context summarization with Conv1D, (3) a bi-directional LSTM for capturing temporal dependencies, (4) CNN layers for local feature extraction, and a (5) final fully connected layer for classification.

The model is trained on the "Sarcasm on Reddit" dataset from Kaggle rather than the dataset used in the original paper. Preprocessing includes data cleaning and dimensionality reduction. Tokenization is done using the 'bert-base-uncased tokenizer' from Hugging Face. Evaluation is performed using accuracy as the primary metric, and the entire model is implemented in Python using TensorFlow.

🎯 Use Cases

Detecting Sarcastic comments or opinions on social media platforms

🟢 Project Status

Current Version: V1.0
Completed

📂 Repository Structure

paper-hbert-sarcasm-detection/
├── README.md
├── LICENSE
├── .gitignore                  
├── assets/                      
│   └── images/
├── datasets/                      
│   └── sarcasm-reddit/
|   │   └── sarcasm_trained_balanced.csv
└── notebooks/               
    └── sarcasm-analysis.ipynb

✨ Features

✅ Custom Hierarchical-BERT model class
✅ Preprocessed Data
✅ Sentence tokenization and padding
✅ Evaluation metrics: Accuracy

🛠️ In progress:

Modification of H-bert Architecutre
Introducing Precision, Recall, F1-score

🚀 Getting Started

📚 Knowledge & Skills Required

Python programming
ML/DL fundamentals, Transformers, Hugging Face Hub
Neural Network Arhitecutres (CNN, LSTM, RNN, BERT)

💻 Software Requirements

IDE (VS Code) or jupyter notebook or google colab
Best run on Kaggle using GPU P100

🛡️ Tech Stack

Language: python
NLP/ML: huggingface_hub, sklearn, pandas, numpy
Deep Learning: pytorch, CNN, LSTM, transformers-models (bert-base-uncased)
Visualization: matplotlib

🔍 Modules Breakdown

📥 (1) Data-Preprocessing : wh

Using a different Dataset than what was used on paper
Loading Sarcasm on Reddit dataset from kaggle
Dimention Reduced, Data Cleaned

🎭 (2) Tokenization : basi

Used 'bert-base-uncased' model from hugging face for tokenization
Tokenized "comment" column

🤖 (3) Heirarchial-BERT :

The proposed H-BERT Architecture proposed in the paper and implemented in code is explained beiefly. The model consists of 5 layers :-
1. (L-1) Sentence Encoding Layer
  - Encode input data into fixed sized vectors
2. (L-2) Context Summarization Layer
  - Convert the indvidual sequence-embeddings from layer-1 to common single-vect
  - Conv1d sentence-summarizer layer : Paper used Conv2d as their data 1.3B but we use only 30k so we use Conv1d
3. (L-3) LSTM - Context Encoder Layer
  - Implement a bi-direc LSTM to capture temporal dependencies from summ-sent layer-2
  - Bi-direc LSTM process data in both forward & backward direc, makes capturing v. easy
4. (L-4) CNN Layer
  - Extracts local features from encoded context vectors of layer-3
  - Try to emphasis significant features relavant to model and
  - Give less attn to irrelavant features
  - Kernel-layer : We use Conv1d kernels instead of Conv2d, also called cnn-filters
  - Pooling-layer : Use max pooling to extract imp features
5. (L-5) FFN / Fully Connected dense-net Layer
  - Proecess the model to give a final output
  - Maps features to final predictions.
The DataFlow:- (L-1) output --> (L-2) output --> (L-3) output --> (L-4) output --> (L-5) output --> predictions (0 or 1)

📊 Evaluation

Using 'Accuracy' Meterics to evaluate model performance
Future work : precision , recall , f1

⚙️ Installation

git clone https://github.com/pointer2Alvee/paper-hbert-sarcasm-detection.git
cd hbert-sarcasm-detection

# Recommended: Use a virtual environment
pip install -r requirements.txt

🖇️ requirements.txt (core packages):

transformers==4.44.0
huggingface_hub==0.24.5
pandas
numpy
tensorflow
matplotlib

💻 Running the App Locally

Open Repo in VS code / Kaggle (recommended)
Run Command
See accuracy

📖 Usage

Open VS Code / kaggle

🧪 Sample Topics Implemented

✅ BERT model
✅ LSTM, CNN, CONVOLUTION, POOLING
⏳ Upcoming : Modification of Hbert

🧭 Roadmap

Paper Implementation of Hbert

🤝 Contributing

Contributions are welcomed!

Fork the repo.
Create a branch: git checkout -b feature/YourFeature
Commit changes: git commit -m 'Add some feature'
Push to branch: git push origin feature/YourFeature
Open a Pull Request.

📜License

Distributed under the MIT License. See LICENSE.txt for more information.

🙏Acknowledgements

Special thanks to the open-source community / youtube for tools and resources.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📜 paper-hbert-sarcasm-detection

📌 Summary

🧠 Overview

🎯 Use Cases

🟢 Project Status

📂 Repository Structure

✨ Features

🚀 Getting Started

📚 Knowledge & Skills Required

💻 Software Requirements

🛡️ Tech Stack

🔍 Modules Breakdown

📊 Evaluation

⚙️ Installation

🖇️ requirements.txt (core packages):

💻 Running the App Locally

📖 Usage

🧪 Sample Topics Implemented

🧭 Roadmap

🤝 Contributing

📜License

🙏Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets/images		assets/images
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

pointer2Alvee/paper-hbert-sarcasm-detection

Folders and files

Latest commit

History

Repository files navigation

📜 paper-hbert-sarcasm-detection

📌 Summary

🧠 Overview

🎯 Use Cases

🟢 Project Status

📂 Repository Structure

✨ Features

🚀 Getting Started

📚 Knowledge & Skills Required

💻 Software Requirements

🛡️ Tech Stack

🔍 Modules Breakdown

📊 Evaluation

⚙️ Installation

🖇️ requirements.txt (core packages):

💻 Running the App Locally

📖 Usage

🧪 Sample Topics Implemented

🧭 Roadmap

🤝 Contributing

📜License

🙏Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages