Skip to content

Challenge to distinguish whether a sentence from a news article expresses the subjective view of the author behind it or presents an objective view on the covered topic

License

Notifications You must be signed in to change notification settings

MatteoFasulo/clef2025-checkthat

Repository files navigation

📌 Fine-tuning of mDeBERTaV3 & ModernBERT for Subjectivity Detection

CheckThat! Lab 2025 - Task 1

This project tackles the problem of subjectivity detection in natural language 🌐—a fundamental task for applications like fake news detection ❌📰 and fact-checking ✅. The goal is to classify sentences as subjective (SUBJ) or objective (OBJ) across various languages: Arabic, German, English, Italian, and Bulgarian.


🔍 Approaches

We employ two primary approaches for subjectivity detection:

1. BERT-like Models (mDeBERTaV3 & ModernBERT)

  • mDeBERTaV3-base 📖
  • ModernBERT-base 🔍
  • Fine-tuned on language-specific datasets with integrated sentiment information 💬 for enhanced performance.

2. Large Language Models (LLMs)

  • Llama3.2-1B 🦙
  • Evaluated on its ability to capture subjectivity from general knowledge representations.

📊 Key Findings

  • BERT-like models exhibit superior performance in capturing nuanced information compared to LLMs.
  • Incorporating sentiment information improves the subjective F1 score significantly for English and Italian; less so for other languages.
  • Decision threshold calibration is essential for improving performance when handling imbalanced label distributions.

📁 Project Structure

  • Data Preparation: 📂 Data augmentation using sentiment scores, tokenization, and preprocessing.
  • Model Training: 🔧 Fine-tuning mDeBERTaV3, ModernBERT, and Llama3.2-1B.
  • Evaluation: 📈 Evaluation metrics include macro-average F1 score and SUBJ F1 score with focus on threshold optimization.

🏗️ Architecture Overview

The architecture of the proposed system is illustrated below:


💻 Requirements

  • Python 3.x 🐍
  • PyTorch 🔥
  • Hugging Face Transformers 🤗
  • Dependencies specified in requirements.txt 📋

📦 Installation

  1. Clone the repository:
 git clone https://github.com/MatteoFasulo/clef2025-checkthat.git
 cd clef2025-checkthat
  1. Install dependencies:
pip install -r requirements.txt

🔬 Evaluation

To evaluate the model performance on the development set for English, use:

python scorer/evaluate.py -g data/english/dev_en.tsv -p results/dev_english_predicted.tsv

To evaluate the sentiment-enhanced model:

python scorer/evaluate.py -g data/english/dev_en.tsv -p results/dev_english_sentiment_predicted_.tsv

🔗 External Resources


Conclusion

This project highlights the effectiveness of BERT-like models for subjectivity detection and emphasizes the importance of handling linguistic variability and class imbalance. Future work will focus on enhancing LLM performance and addressing challenges identified in the error analysis.


📜 License

Licensed under the MIT License - see the LICENSE file for details.

About

Challenge to distinguish whether a sentence from a news article expresses the subjective view of the author behind it or presents an objective view on the covered topic

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

  •