Sarcasm Detection

This project demonstrates a sarcasm detection model using a Long Short-Term Memory (LSTM) neural network. The model processes textual input to classify it as sarcastic or non-sarcastic. The notebook includes data preprocessing, model training, evaluation, testing, and visualization steps.

Project Overview

Sarcasm detection is crucial in natural language processing (NLP), especially for applications in social media sentiment analysis, chatbots, and opinion mining. Since sarcasm often conveys an opposite meaning from the literal words, it can mislead sentiment detection models. This project employs an LSTM model to learn sequential dependencies in text data for identifying sarcastic language.

Dataset

The notebook uses labeled datasets containing sarcastic and non-sarcastic text samples from sources such as Reddit, including:

DWAEF-sarc
GEN-sarc
HYP-sarc
RQ-sarc
REDDIT-sarc

Dataset Preprocessing

The preprocessing steps include:

Noise Removal: Removing special characters, URLs, and non-alphanumeric tokens to reduce noise.
Tokenization and Padding: Tokenizing the text and padding sequences for uniform input length.
Label Encoding: Converting sarcasm labels into a binary format for classification.

Requirements

The following libraries are required to run the notebook:

pip install numpy pandas torch scikit-learn matplotlib nltk seaborn

Usage

Load Dataset: Import and load the dataset into a Pandas DataFrame.
Preprocess Data: Execute data cleaning, tokenization, padding, and label encoding.
Build Model: Define an LSTM model with an embedding layer, LSTM layer, and dense output layer for binary classification.
Train Model: Train the model on the training dataset with cross-validation.
Evaluate Results: Calculate performance metrics and visualize the results.

To run the notebook:

jupyter notebook lstm-sarcasm-detection.ipynb

Model Architecture

The LSTM model architecture consists of:

Embedding Layer: Converts tokens into dense vectors, capturing semantic meaning.
LSTM Layer: Processes sequences, learning dependencies over time steps to detect contextual cues for sarcasm.
Dense Layer: Outputs a binary classification for sarcastic and non-sarcastic labels.

Training and Evaluation

The model undergoes 10-fold cross-validation to ensure reliable performance metrics. For each fold:

The model is trained on a subset of data, and validation is performed on the remaining set.
Metrics Calculated Per Fold:
- Accuracy: Measures correct predictions over total predictions.
- Precision: Indicates the accuracy of sarcasm predictions.
- Recall: Reflects the model's capability to identify all sarcastic samples.
- F1 Score: Combines precision and recall for balanced evaluation.

Average Results Across Folds

Accuracy: 70.97%
Precision: 72%
Recall: 69%
F1 Score: 70%

These metrics are computed and averaged to assess the model's ability to generalize across different subsets.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dataset		dataset
lstm-sarcasm-detection.ipynb		lstm-sarcasm-detection.ipynb
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sarcasm Detection

Table of Contents

Project Overview

Dataset

Dataset Preprocessing

Requirements

Usage

Model Architecture

Training and Evaluation

Average Results Across Folds

About

Uh oh!

Releases

Packages

Languages

Chiraz32/Sarcasm-Detection

Folders and files

Latest commit

History

Repository files navigation

Sarcasm Detection

Table of Contents

Project Overview

Dataset

Dataset Preprocessing

Requirements

Usage

Model Architecture

Training and Evaluation

Average Results Across Folds

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages