A Comparative Study of Fine-Tuning Scenarios for Transformer Models in Indonesian Sentiment Analysis

This repository contains the code and notebooks for a research project aimed at evaluating various fine-tuning strategies on transformer models for the task of Indonesian-language sentiment analysis.

Abstract

This research conducts a comparative study to evaluate three fine-tuning scenarios—Standard Fine-Tuning, Gradual Unfreezing, and Differential Learning Rates—across three model architectures: IndoBERT-base, IndoBERTweet, and RoBERTa. The experiments were performed on two datasets from different domains: app reviews (BBM Dataset) and political comments (Pemilu Dataset), using F1-Score as the primary evaluation metric. The results show that IndoBERTweet consistently emerged as the top-performing model, while the Standard Fine-Tuning strategy with an optimized learning rate proved to be superior to the other two, more complex techniques.

🔑 Key Findings

IndoBERTweet is the Best Model: This model consistently outperformed other architectures. In its best-case scenario (Standard Fine-Tuning), IndoBERTweet achieved an F1-Score of 0.9218 on the BBM Dataset and 0.7431 on the Pemilu Dataset.
Hyperparameter Tuning is the Best Strategy: Scenario 1, with an optimized learning rate, yielded peak performance. For instance, IndoBERTweet on the BBM Dataset reached an F1-Score of 0.9218 with this method, surpassing the results from Scenario 3 (Differential LR, 0.9159) and Scenario 2 (Gradual Unfreezing, 0.9042).
Differential LR is Superior to Gradual Unfreezing: Among the two advanced techniques, Differential Learning Rates (S3) proved to be far more robust. On the RoBERTa model with the BBM Dataset, Scenario 3 achieved an F1-Score of 0.8530, whereas Scenario 2 suffered a performance failure with a score of only 0.4890, indicating a massive difference in effectiveness.
Data Quality Has a Major Impact: Model performance varied drastically between datasets. The highest F1-Score on the BBM Dataset (0.9218) was nearly 25% higher than the highest score on the Pemilu Dataset (0.7431). This disparity was also reflected in the validation loss, where the lowest loss on the Pemilu Dataset (~0.64) was significantly higher than on the BBM Dataset (~0.27), suggesting that domain complexity, noise, and data ambiguity are major challenges in political sentiment analysis.

📁 Project Structure

kp-penelitian/
│
├── data/
│   ├── dataset_bbm.csv         # BBM app review dataset
│   └── dataset_pemilu.csv      # General Election comments dataset
│
├── notebooks/
│   ├── 1_Skenario_Fine_Tuning.ipynb
│   ├── 2_Skenario_Gradual_Unfreezing.ipynb
│   └── 3_Skenario_Differential_LR.ipynb
│
├── .gitignore
└── README.md

🔬 Experimental Methodology

The experiment was conducted by comparing three models across three training scenarios:

Models Used:

IndoBERT-base (indobenchmark/indobert-base-p1)
IndoBERTweet (indobenchmark/indobertweet-base-p1)
RoBERTa (roberta-base or other Indonesian variants)

Experimental Scenarios:

Scenario 1: Standard Fine-Tuning & LR Optimization: Training the entire model simultaneously while searching for the best learning rate via cross-validation.
Scenario 2: Gradual Unfreezing: Training the model layer by layer, starting with the classifier head and progressively unfreezing the layers beneath it.
Scenario 3: Differential Learning Rates: Training the entire model simultaneously but applying different learning rates to distinct layer groups.

🚀 How to Run the Experiment

To reproduce the results of this research, follow these steps:

1. Prerequisites

Python 3.8+
pip and venv (recommended)
GPU with sufficient VRAM (at least 8GB recommended) for training the models.

2. Clone the Repository

git clone https://github.com/rifqimaruf/kp-penelitian.git
cd kp-penelitian

3. Set up a Virtual Environment (Recommended)

# Create a virtual environment
python -m venv venv

# Activate the environment (Windows)
.\venv\Scripts\activate

# Activate the environment (macOS/Linux)
source venv/bin/activate

📊 Results

Detailed results from each scenario, including comparison tables, F1-Scores, and loss curves for each model and dataset, can be found in the final research report. In summary, the best combination found was:

Model: IndoBERTweet
Strategy: Standard Fine-Tuning
Optimal Learning Rate: 3e-05 (for both datasets)

Researcher's Notes

According to the researcher, there are three reasons why Scenarios 2 and 3 failed to improve upon the baseline performance:

The baseline was already tested with several learning rates and validated using 3-Fold Cross-Validation, setting a very high benchmark.
The determination of which layers to freeze in the Gradual Unfreezing scenario was too rigid and should have been further adapted to each model's specific architecture.
The combination of learning rates in the Differential Learning Rates scenario was not yet optimized.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Comparative Study of Fine-Tuning Scenarios for Transformer Models in Indonesian Sentiment Analysis

Abstract

🔑 Key Findings

📁 Project Structure

🔬 Experimental Methodology

Models Used:

Experimental Scenarios:

🚀 How to Run the Experiment

1. Prerequisites

2. Clone the Repository

3. Set up a Virtual Environment (Recommended)

📊 Results

Researcher's Notes

About

Uh oh!

Releases

Packages

Languages

License

rifqimaruf/comparative-study-sentiment-transformer-id

Folders and files

Latest commit

History

Repository files navigation

A Comparative Study of Fine-Tuning Scenarios for Transformer Models in Indonesian Sentiment Analysis

Abstract

🔑 Key Findings

📁 Project Structure

🔬 Experimental Methodology

Models Used:

Experimental Scenarios:

🚀 How to Run the Experiment

1. Prerequisites

2. Clone the Repository

3. Set up a Virtual Environment (Recommended)

📊 Results

Researcher's Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages