The Emotions of the Crowd: Learning Image Sentiment from Tweets via Cross-modal Distillation

📖 Introduction

This work introduces T4SA 2.0, a comprehensive dataset and cross-modal learning method for training visual models for Sentiment Analysis in the Twitter domain.

Our approach fine-tunes Vision-Transformer (ViT) models pre-trained on ImageNet-21k, achieving remarkable results on external benchmarks with manual annotations, even surpassing the current State-of-the-Art!

Key Features

🎯 Cross-modal distillation: Leverages both visual and textual information from social media
📊 Large-scale dataset: ~3.7M images crawled from April 1 to June 30
🤖 Automated labeling: Uses distant supervision to minimize human annotation efforts
🏆 SOTA performance: Beats current state-of-the-art on multiple benchmarks
🔧 Ready-to-use models: Pre-trained models available for immediate deployment

The cross-modal teacher-student learning technique eliminates the need for human annotators while enabling the creation of vast training sets, supporting future research in training robust visual models as parameter counts continue to grow exponentially.

🚀 Quick Start

Prerequisites

Python 3.7+
CUDA-compatible GPU (recommended)

Installation

Clone the repository

git config --global http.postBuffer 1048576000
git clone --recursive https://github.com/fabiocarrara/cross-modal-visual-sentiment-analysis.git
cd cross-modal-visual-sentiment-analysis

Create a virtual environment (recommended)

python -m venv venv
source venv/bin/activate

Install Python dependencies

pip install --upgrade pip
pip install -r requirements.txt

Install system dependencies (Linux/Ubuntu)
```
chmod +x setup.sh
sudo ./setup.sh
```

🧪 Usage

Benchmark Evaluation

Test a single model on a benchmark

python3 scripts/test_benchmark.py -m <model_name> -b <benchmark_name>

Available models:

boosted_model (recommended)
ViT_L16, ViT_L32, ViT_B16, ViT_B32
merged_T4SA, bal_flat_T4SA2.0, bal_T4SA2.0, unb_T4SA2.0
B-T4SA_1.0_upd_filt, B-T4SA_1.0_upd, B-T4SA_1.0

Available benchmarks:

5agree, 4agree, 3agree
FI_complete, emotion_ROI_test, twitter_testing_2

Five-fold cross-validation

python3 scripts/5_fold_cross.py -b <benchmark_name>

Returns mean accuracy, standard deviation, and saves predictions

Fine-tune on FI dataset

python3 scripts/fine_tune_FI.py

🏗️ Project Structure

.
├── dataset/
│   ├── benchmark/
│   │   ├── EmotionROI/
│   │   │   └── images/
│   │   │       ├── anger/
│   │   │       ├── disgust/
│   │   │       ├── fear/
│   │   │       ├── joy/
│   │   │       ├── sadness/
│   │   │       └── surprise/
│   │   ├── FI/
│   │   │   ├── images/
│   │   │   │   ├── amusement/
│   │   │   │   ├── anger/
│   │   │   │   ├── awe/
│   │   │   │   ├── contentment/
│   │   │   │   ├── disgust/
│   │   │   │   ├── excitement/
│   │   │   │   ├── fear/
│   │   │   │   └── sadness/
│   │   │   ├── split_1/ ... split_5/
│   │   ├── Twitter Testing Dataset I/
│   │   └── Twitter Testing Dataset II/
│   ├── t4sa 1.0/
│   │   ├── dataset with new labels/
│   │   └── original dataset/
│   └── t4sa 2.0/
│       ├── bal_T4SA2.0/
│       ├── bal_flat_T4SA2.0/
│       ├── img/
│       ├── merged_T4SA/
│       └── unb_T4SA2.0/
├── models/
├── notebooks/
├── predictions/
├── scripts/
├── requirements.txt
├── setup.sh
└── README.md

🎯 Pre-trained Models

Our models achieve state-of-the-art performance on Twitter sentiment analysis benchmarks:

		Confidence Filter Threshold				Accuracy on Twitter Dataset (%)
Model	Dataset	Pos	Neu	Neg	Architecture	5 agree	≥4 agree	≥3 agree
Model 3.1	A	-	-	-	ViT-B/32	82.2	78.0	75.5
Model 3.2	A	0.70	0.70	0.70	ViT-B/32	84.7	79.7	76.6
Model 3.3	B	0.70	0.70	0.70	ViT-B/32	82.3	78.7	75.3
Model 3.4	B	0.90	0.90	0.70	ViT-B/32	84.4	80.3	77.1
Model 3.5	A+B	0.90	0.90	0.70	ViT-B/32	86.5	82.6	78.9
Model 3.6	A+B	0.90	0.90	0.70	ViT-L/32	85.0	82.4	79.4
Model 3.7	A+B	0.90	0.90	0.70	ViT-B/16	87.0	83.1	79.4
Model 3.8	A+B	0.90	0.90	0.70	ViT-L/16	87.8	84.8	81.9

Best performing model: ViT-L/16 (Model 3.8) achieves 87.8% accuracy on the most challenging 5-agreement benchmark.

📊 Dataset

The T4SA 2.0 dataset will be made available soon. It contains:

~3.7M images from Twitter
Cross-modal annotations using distant supervision
Multiple confidence thresholds for different use cases
Balanced and unbalanced versions

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📚 Citation

If you use this work in your research, please cite:

@inproceedings{serra2023emotions,
  author    = {Serra, Alessio and Carrara, Fabio and Tesconi, Maurizio and Falchi, Fabrizio},
  editor    = {Kobi Gal and Ann Now{\'{e}} and Grzegorz J. Nalepa and Roy Fairstein and Roxana Radulescu},
  title     = {The Emotions of the Crowd: Learning Image Sentiment from Tweets via Cross-Modal Distillation},
  booktitle = {{ECAI} 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Krak{\'{o}}w, Poland - Including 12th Conference on Prestigious Applications of Intelligent Systems ({PAIS} 2023)},
  series    = {Frontiers in Artificial Intelligence and Applications},
  volume    = {372},
  pages     = {2089--2096},
  publisher = {{IOS} Press},
  year      = {2023},
  url       = {https://doi.org/10.3233/FAIA230503},
  doi       = {10.3233/FAIA230503},
}

Report Bug • Request Feature • Website

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Emotions of the Crowd: Learning Image Sentiment from Tweets via Cross-modal Distillation

📖 Introduction

Key Features

🚀 Quick Start

Prerequisites

Installation

🧪 Usage

Benchmark Evaluation

Test a single model on a benchmark

Five-fold cross-validation

Fine-tune on FI dataset

🏗️ Project Structure

🎯 Pre-trained Models

📊 Dataset

📄 License

📚 Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
dataset		dataset
models		models
notebooks		notebooks
predictions		predictions
scripts		scripts
README.md		README.md
install_dependencies.sh		install_dependencies.sh
requirements.txt		requirements.txt
setup.sh		setup.sh

codiceSpaghetti/T4SA-2.0

Folders and files

Latest commit

History

Repository files navigation

The Emotions of the Crowd: Learning Image Sentiment from Tweets via Cross-modal Distillation

📖 Introduction

Key Features

🚀 Quick Start

Prerequisites

Installation

🧪 Usage

Benchmark Evaluation

Test a single model on a benchmark

Five-fold cross-validation

Fine-tune on FI dataset

🏗️ Project Structure

🎯 Pre-trained Models

📊 Dataset

📄 License

📚 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages