- Project
- Data
- Persuasion Techniques
- Related Work
- Hierarchical Loss Function
- Results
- Discussion and Error Analysis
- Conclusion
- Usage
- References
Memes have evolved from entertainment to powerful instruments for online disinformation campaigns. This project aims to build robust models to detect persuasion techniques embedded within memes, enhancing the ability to mitigate the impact of such content on public opinion.
"Detecting Persuasion Techniques in Memes" is a research project aimed at identifying and analyzing various persuasion techniques used in memes. This project is part of the SemEval2024 shared task on "Multilingual Detection of Persuasion Techniques in Memes." The challenge involves understanding and classifying the persuasive elements in memes, which may include logical fallacies and emotional appeals, among others.
Memes are a potent medium in online discourse, often used in disinformation campaigns. By combining text and images, they can effectively influence public opinion. This project seeks to detect these techniques through hierarchical multilabel classification tasks, analyzing both the textual and visual components of memes.
The tasks are divided into:
- Subtask 1: Identification of persuasion techniques from the textual content alone.
- Subtask 2a: Multimodal analysis, identifying persuasion techniques from both text and image.
This repository contains the code, models developed for the SemEval2024 competition, providing tools to tackle the challenges of detecting persuasion in multimodal content.
To access the data for this project and submit your predictions, register at the following link:
After registration, you will be able to download the data and participate in the tasks by submitting your predictions through the provided platform.
Recent research has identified 22 persuasion techniques applied across textual and visual mediums. This project tackles the challenge of identifying these techniques through a multi-label hierarchical classification approach. Given the evolving nature of internet memes and their heavy reliance on current trends and sarcasm, detecting such techniques requires a sophisticated methodology.
- Propaganda Detection: Traditional binary classifiers like SVM, Naive Bayes, and Random Forest show limitations in identifying complex hate speech.
- Transformer Models: Advanced methods such as DeBERTa, XLM-RoBERTa, and GPT-3 have demonstrated improved performance in multimodal setups.
- Shared Tasks: Models from competitions like SemEval highlight the importance of transformer-based and multimodal approaches for detecting propaganda.
The hierarchical loss is designed to maintain consistency across classification levels:
- Layer-Specific Loss: Layer specific loss is straight forward it is summation of losses at each level for predictions made by considering both current level and previous level (parent level) representations. Formulation of this loss at each level is given as:
The sum of all layers is defined as:
- Dependency Loss: Penalizes incorrect predictions that deviate from valid parent-child relationships.
where
The total dependency loss is then:
where
Overall, Hierarchical loss is given by
where
The following table provides the success of Hierarchical loss over conventional BCE loss for multi label classification.
Feature Extractor | Loss Function | Hierarchical F1 | Hierarchical Precision | Hierarchical Recall |
---|---|---|---|---|
mBERT | Traditional BCE | 0.518 | 0.615 | 0.448 |
mBERT | Hierarchical Loss | 0.573 ⬆️ (10.62%) | 0.586 | 0.560 ⬆️ (25%) |
OpenAI text-embedding-3-large | Traditional BCE | 0.585 | 0.659 | 0.526 |
OpenAI text-embedding-3-large | Hierarchical Loss | 0.645 ⬆️ (10.26%) | 0.656 | 0.635 ⬆️ (20.72%) |
SOTA | - | 0.752 🏆 | 0.684 | 0.836 |
Notes: ⬆️ indicates improvement over baseline | 🏆 indicates best performance | Percentages show relative improvement
Achieved state-of-the-art results in Bulgarian and Macedonian languages for SemEval 2024 Task 4 (Subtask 2a).
Language | H-F1 (Subtask 1) | H-F1 (Subtask 2a) |
---|---|---|
English | 0.66391 | 0.69666 |
Bulgarian | 0.48411 | 0.65638 |
Macedonian | 0.46615 | 0.69844 |
Arabic | 0.44478 | 0.53378 |
- Misclassification Issues: Instances where the model predicted incorrect techniques (e.g., predicting “Loaded Language” instead of “Doubt”).
- Multimodal Challenges: Handling multiple speakers within memes remains a challenge. Further improvements could be made through data cleaning and prompt engineering with large language models.
This project successfully demonstrates the potential of hierarchical classification for detecting persuasion techniques in memes. With improvements in loss functions and advanced embeddings, the model achieves state-of-the-art performance. Future work will explore more robust multimodal integration and language expansion to further enhance meme analysis.
To reproduce the results of this project, follow these steps to set up the environment and run the implementation.
- First, clone the project repository from GitHub:
git clone https://github.com/iqbal-sk/Detecting-Persuasion-Techniques-in-Memes.git
cd Detecting-Persuasion-Techniques-in-Memes
- It is recommended to use a virtual environment to avoid dependency conflicts. You can create a virtual environment using
venv
:
python -m venv env
source env/bin/activate
- Once the virtual environment is activated, install the project dependencies listed in the requirements.txt file:
pip install -r requirements.txt
- Configure the Task and Models
Open the configuration file (config.j2
) and set the task variable and the corresponding models as follows:
- task: Choose either
subtask1
orsubtask2a
.- For subtask1, configure only the
text_model
:- Possible values:
"mBERT"
,"XLNet"
,"XLMRoBERTa"
,"OpenAiSmall"
,"OpenAiLarge"
- Possible values:
- For subtask2a, configure both
text_model
andimage_model
:text_model
:"OpenAiSmall"
,"OpenAiLarge"
,"mBERT"
image_model
:"ResNet50"
,"CLIP"
- For subtask1, configure only the
You can adjust the hyperparameters
to better suit your experiments. Modify the hyperparameters section in the configuration file (config.j2
)
-
Gao, Dehong. "Deep Hierarchical Classification for Category Prediction in E-commerce System." Proceedings of the 3rd Workshop on e-Commerce and NLP, Association for Computational Linguistics, Seattle, WA, USA, 2020, pp. 64-68. doi:10.18653/v1/2020.ecnlp-1.10.
-
Pires, Telmo, Eva Schlinger, and Dan Garrette. "How Multilingual is Multilingual BERT?" Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 4996-5001. doi:10.18653/v1/P19-1493.
-
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." CoRR, vol. abs/1512.03385, 2015. arXiv:1512.03385.
-
Rogers, Anna, Olga Kovaleva, and Anna Rumshisky. "A Primer in BERTology: What We Know About How BERT Works." Transactions of the Association for Computational Linguistics, vol. 8, MIT Press, 2020, pp. 842-866. doi:10.1162/tacl_a_00349.
-
Radford, Alec, et al. "Learning Transferable Visual Models From Natural Language Supervision." CoRR, vol. abs/2103.00020, 2021. arXiv:2103.00020.
-
He, Pengcheng, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. "DeBERTa: Decoding-enhanced BERT with Disentangled Attention." CoRR, vol. abs/2006.03654, 2020. arXiv:2006.03654.
-
Conneau, Alexis, et al. "Unsupervised Cross-lingual Representation Learning at Scale." CoRR, vol. abs/1911.02116, 2019. arXiv:1911.02116.
-
Lin, Tsung-Yi, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Dollár. "Focal Loss for Dense Object Detection." CoRR, vol. abs/1708.02002, 2017. arXiv:1708.02002.
-
Wu, Tong, Qingqiu Huang, Ziwei Liu, Yu Wang, and Dahua Lin. "Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets." CoRR, vol. abs/2007.09654, 2020. arXiv:2007.09654.
-
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." CoRR, vol. abs/1810.04805, 2018. arXiv:1810.04805.
-
Chung, Hyung Won, et al. "Scaling Instruction-Finetuned Language Models." 2022. arXiv:2210.11416.