📌 Fine-tuning of mDeBERTaV3 & ModernBERT for Subjectivity Detection

CheckThat! Lab 2025 - Task 1

This project tackles the problem of subjectivity detection in natural language 🌐—a fundamental task for applications like fake news detection ❌📰 and fact-checking ✅. The goal is to classify sentences as subjective (SUBJ) or objective (OBJ) across various languages: Arabic, German, English, Italian, and Bulgarian.

🔍 Approaches

We employ two primary approaches for subjectivity detection:

1. BERT-like Models (mDeBERTaV3 & ModernBERT)

mDeBERTaV3-base 📖
ModernBERT-base 🔍
Fine-tuned on language-specific datasets with integrated sentiment information 💬 for enhanced performance.

2. Large Language Models (LLMs)

Llama3.2-1B 🦙
Evaluated on its ability to capture subjectivity from general knowledge representations.

📊 Key Findings

BERT-like models exhibit superior performance in capturing nuanced information compared to LLMs.
Incorporating sentiment information improves the subjective F1 score significantly for English and Italian; less so for other languages.
Decision threshold calibration is essential for improving performance when handling imbalanced label distributions.

📁 Project Structure

Data Preparation: 📂 Data augmentation using sentiment scores, tokenization, and preprocessing.
Model Training: 🔧 Fine-tuning mDeBERTaV3, ModernBERT, and Llama3.2-1B.
Evaluation: 📈 Evaluation metrics include macro-average F1 score and SUBJ F1 score with focus on threshold optimization.

🏗️ Architecture Overview

The architecture of the proposed system is illustrated below:

💻 Requirements

Python 3.x 🐍
PyTorch 🔥
Hugging Face Transformers 🤗
Dependencies specified in requirements.txt 📋

📦 Installation

Clone the repository:

 git clone https://github.com/MatteoFasulo/clef2025-checkthat.git
 cd clef2025-checkthat

Install dependencies:

pip install -r requirements.txt

🔬 Evaluation

To evaluate the model performance on the development set for English, use:

python scorer/evaluate.py -g data/english/dev_en.tsv -p results/dev_english_predicted.tsv

To evaluate the sentiment-enhanced model:

python scorer/evaluate.py -g data/english/dev_en.tsv -p results/dev_english_sentiment_predicted_.tsv

🔗 External Resources

GitHub Repository 📂
Dataset 🗃️

✅ Conclusion

This project highlights the effectiveness of BERT-like models for subjectivity detection and emphasizes the importance of handling linguistic variability and class imbalance. Future work will focus on enhancing LLM performance and addressing challenges identified in the error analysis.

📜 License

Licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.github		.github
SubjectivityDetection @ fa066f6		SubjectivityDetection @ fa066f6
baseline		baseline
data		data
img		img
misc		misc
report		report
results		results
scorer		scorer
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
error_analysis.ipynb		error_analysis.ipynb
model_pipeline)_schema_orizontal.drawio		model_pipeline)_schema_orizontal.drawio
monolingual.ipynb		monolingual.ipynb
multilingual.ipynb		multilingual.ipynb
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock
zero-shot.ipynb		zero-shot.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

📌 Fine-tuning of mDeBERTaV3 & ModernBERT for Subjectivity Detection

🔍 Approaches

1. BERT-like Models (mDeBERTaV3 & ModernBERT)

2. Large Language Models (LLMs)

📊 Key Findings

📁 Project Structure

🏗️ Architecture Overview

💻 Requirements

📦 Installation

🔬 Evaluation

🔗 External Resources

✅ Conclusion

📜 License

About

Uh oh!

Sponsor this project

Uh oh!

Uh oh!

Contributors 3

Languages

Uh oh!

License

MatteoFasulo/clef2025-checkthat

Folders and files

Latest commit

History

Repository files navigation

📌 Fine-tuning of mDeBERTaV3 & ModernBERT for Subjectivity Detection

🔍 Approaches

1. BERT-like Models (mDeBERTaV3 & ModernBERT)

2. Large Language Models (LLMs)

📊 Key Findings

📁 Project Structure

🏗️ Architecture Overview

💻 Requirements

📦 Installation

🔬 Evaluation

🔗 External Resources

✅ Conclusion

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Uh oh!

Uh oh!

Contributors 3

Languages