Skip to content

The Multimodal Sarcasm Detection System detects sarcasm in multimedia content using image-caption generation and NLP. It achieved 1st place in UITC2024 by classifying sarcasm into four categories: image, text, multi sarcasm, and not sarcasm.

Notifications You must be signed in to change notification settings

xndien2004/Multimodal-Sarcasm-Detection-for-UITC2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal Sarcasm Detection for UITC2024

A multimodal sarcasm detection system utilizing image-caption generation and natural language processing, developed for the UITC2024 competition, where we achieved 1st place.

Static Badge Static Badge Static Badge Static Badge Static Badge

Table of Contents

📍 Overview

The Multimodal Sarcasm Detection System is designed to detect sarcasm in multimedia content using image-text pairs. It generates captions from images using a pre-trained Vintern-1B-v2 model, then processes the data through three input streams: original text, generated captions, and image features. The system integrates these inputs into a unified model that classifies sarcasm across different categories: text sarcasm, image sarcasm, multi-modal sarcasm, and no sarcasm.

This system is developed for the UITC2024 competition and aims to advance the understanding of sarcasm detection in multimodal contexts.

🎯 Features

  1. Multimodal Input Handling

    • Text-based input: Handles original text and captions generated from images.
    • Image-based input: Generates image captions for context-based analysis.
  2. Sarcasm Classification

    • Classifies content into four categories: image sarcasm, text sarcasm, multi sarcasm, and not sarcasm.
  3. Model Architecture

    • Utilizes state-of-the-art models like Vintern-1B-v2 for image captioning and transformers for text analysis.
    • Integrated ViT and Jina Embedding V3 for feature extraction, with optimization using Cross Entropy and Focal Loss.
  4. Voting Model Integration

    • Combines the predictions of four different models trained for 2-class, 3-class, and 4-class tasks to ensure accurate final predictions.

🏅 Results

Team Name F1 Precision Recall
Faster-United 0.4475 0.4403 0.4563
US1 0.4403 0.4462 0.5678
AIbou 0.4386 0.4256 0.4935
BEd 0.4328 0.4240 0.4574
MeowProfs 0.4293 0.4185 0.4511

Our team Faster-United achieved 1st place with an F1 score of 0.4475. The table above shows the top 5 teams and their corresponding F1, Precision, and Recall scores. We are proud of the results and our system's performance across various metrics in the UITC2024 competition.

🚀 Setup and Usage

  1. Clone the Repository

    git clone https://github.com/xndien2004/Multimodal-Sarcasm-Detection-for-UITC2024.git
    cd Multimodal-Sarcasm-Detection-for-UITC2024
  2. Install Dependencies Make sure Python is installed and then install the necessary dependencies:

    pip install -r requirements.txt
  3. Run trainer To run train, execute:

    bash run_trainer.sh

    This will start the application and allow you to test the sarcasm detection on your input data.

👣 Workflow

  • Data Processing: The system processes image and text data, generating captions for images and using the original text for classification.
  • Model Training: The four models (trained for 2-class, 3-class, and 4-class tasks) work together to detect sarcasm across different types of input.
  • Voting Model: The predictions of individual models are aggregated using a Voting Model to produce the final classification.

📐 App Structure

├── Multimodal-Sarcasm-Detection-for-UITC2024/
│   ├── config/
│   │   ├── config_trainer.yaml
│   ├── pic/
│   ├── src/
│   │   ├── data_processing/
│   │   ├── multimodal_classifier/
│   │   ├── pipeline_notebook/
|   |   ├── utils.py
│   ├── requirements.txt

🧑‍💻 Contributors

About

The Multimodal Sarcasm Detection System detects sarcasm in multimedia content using image-caption generation and NLP. It achieved 1st place in UITC2024 by classifying sarcasm into four categories: image, text, multi sarcasm, and not sarcasm.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published