Skip to content

hazalkntr/sarcasm-detector-nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CTX-SARC-Embed: Advanced Sarcasm Detection System

Python PyTorch MLflow

🎯 Overview

CTX-SARC-Embed is a state-of-the-art sarcasm detection system that uses transfer learning with RoBERTa-base to classify social media text into three categories: Sarcasm, Irony, and Regular. The system achieves 93.9% accuracy and 93.6% macro F1-score through advanced data preprocessing, model optimization, and comprehensive experiment tracking.


🔍🔍🔍 Dataset, experimental results and the best model checkpoints are available at:

https://drive.google.com/drive/folders/1GJ0pQNgMWu3WHQq1-GHKDNse4nEpfcW1?usp=sharing

📁📁📁 Project Report and Checklist documents are under the Docs file!


🏆 Key Achievements

  • 93.9% Test Accuracy with balanced performance across all classes
  • Fixed Critical Data Issues: Eliminated 31.89% data leakage, corrected 23,276 mislabeled samples
  • Parameter Efficiency: 97.7% efficiency through transfer learning (231K trainable out of 124.9M total)
  • Fast Training: 10-15 minutes training time with frozen backbone approach
  • Production Ready: Comprehensive MLflow tracking and deployment artifacts
  • Reproducible Pipeline: Complete experiment tracking and version control

### 🧠 Technical Details

**Base Model**: RoBERTa-base (124.6M parameters)

- **Frozen Backbone**: Transfer learning approach for efficiency
- **Trainable Classifier**: 3-layer MLP (768→256→128→3)
- **Parameter Efficiency**: 97.7% (only 231K trainable parameters)

**Classifier Architecture**:

Input: RoBERTa [CLS] token (768d) ├── Dropout(0.2) + Linear(768→256) + ReLU + BatchNorm ├── Dropout(0.1) + Linear(256→128) + ReLU + BatchNorm └── Dropout(0.06) + Linear(128→3) → [Sarcasm, Irony, Regular]


**Regularization Techniques**:

- Multi-layer dropout (0.2, 0.1, 0.06)
- Batch normalization
- Weight decay (0.01)
- Early stopping (patience: 4)

## 📊 Data Pipeline

![Data Pipeline](figure_2025-01-01/02_data_pipeline_20250601_180127.png)

### 🔧 Data Processing Workflow

1. **Raw Data Analysis** (81,408 train + 8,128 test)
2. **Critical Issue Detection & Fixing**:

   - **Data Leakage**: 31.89% → 0% ✅
   - **Label Mislabeling**: 23,276 corrections ✅
   - **Duplicates**: 16.47% removed ✅
   - **Encoding Issues**: 3,609 HTML entities fixed ✅

3. **Advanced Preprocessing**:

   - RoBERTa tokenization (max length: 128)
   - Context simulation for social media text
   - Hashtag preservation for semantic meaning
   - Special token handling ([CLS], [SEP])

4. **Final Clean Dataset** (64,657 train + 7,185 test)

### 📈 Data Quality Metrics

| Metric                 | Original | After Cleaning | Improvement |
| ---------------------- | -------- | -------------- | ----------- |
| **Data Leakage**       | 31.89%   | 0%             | ✅ 100%     |
| **Mislabeled Samples** | 23,276   | 0              | ✅ Fixed    |
| **Duplicates**         | 16.47%   | 0%             | ✅ Removed  |
| **Encoding Issues**    | 3,609    | 0              | ✅ Fixed    |
| **Clean Rate**         | ~60%     | 99.9%          | ✅ +39.9%   |

## 📊 Training & Performance

![Training Metrics](figure_2025-01-01/03_training_metrics_20250601_180128.png)

### 🎯 Training Configuration

```yaml
Model: RoBERTa-base (frozen) + 3-layer MLP
Optimizer: AdamW
Learning Rate: 3e-5
Batch Size: 64
Epochs: 10
Scheduler: ReduceLROnPlateau
Weight Decay: 0.01
Early Stopping: Patience 4
Device: CUDA (if available)

🏆 Performance Results

Performance Analysis

Metric Score Percentage
Test Accuracy 0.9389 93.9%
Macro F1-Score 0.9364 93.6%
Weighted F1-Score 0.9388 93.9%

Per-Class Performance:

  • Sarcasm: 93.1% F1-score
  • Irony: 93.7% F1-score
  • Regular: 94.0% F1-score

📊 Dataset Analysis

Dataset Analysis

📈 Dataset Statistics

Training Set Distribution:

  • Sarcasm: 33.3% (21,552 samples)
  • Irony: 33.4% (21,584 samples)
  • Regular: 33.3% (21,521 samples)

Text Statistics:

  • Average Length: 67.8 characters
  • Token Range: 12-142 tokens
  • Language: English (social media text)
  • Context: Twitter-like short messages

🔬 MLflow Experiment Tracking

MLflow Tracking

📊 Experiment Management

Tracked Metrics:

  • Training/Validation Loss & F1-Score
  • Learning rate per epoch
  • Final test performance metrics
  • Model convergence monitoring

Logged Parameters:

  • Model architecture settings
  • Training hyperparameters
  • Data preprocessing configuration
  • Regularization parameters

Artifacts:

  • Best model checkpoints (best_model_*.pt)
  • Training visualizations
  • Performance reports
  • Configuration files

🚀 Quick Start

1️⃣ Installation

# Clone repository
git clone <repository-url>
cd CTX-SARC-Embed

# Install dependencies
pip install -r requirements.txt

2️⃣ Training

# Run optimized training pipeline
python optimized_training_pipeline_2025-01-01.py

# Monitor with MLflow
mlflow ui

3️⃣ Analysis & Visualization

# Generate comprehensive analysis
python comprehensive_analysis_2025-01-01.py

# Create detailed visualizations
python advanced_comprehensive_visualization_2025-01-01.py
python detailed_performance_visualization_2025-01-01.py

4️⃣ Model Usage

import torch
from transformers import AutoTokenizer
from your_model import AdvancedSarcasmClassifier

# Load model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = AdvancedSarcasmClassifier().to(device)
model.load_state_dict(torch.load('best_model_20250601_173155.pt', map_location=device))
tokenizer = AutoTokenizer.from_pretrained('roberta-base')

# Predict
text = "Oh great, another Monday morning! 🙄"
inputs = tokenizer(text, return_tensors='pt', truncation=True, padding=True, max_length=128)
with torch.no_grad():
    logits, _ = model(inputs['input_ids'].to(device), inputs['attention_mask'].to(device))
    prediction = torch.argmax(logits, dim=1)

# Output: 0=Sarcasm, 1=Irony, 2=Regular

📊 Comprehensive Dashboard

Comprehensive Analysis

About

A rapid, resource-efficient system to detect sarcasm in Twitter posts.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages