This repository contains a comprehensive project for sentiment analysis on Mobile Legends reviews. The project employs various Natural Language Processing (NLP), Machine Learning (ML), and Deep Learning (DL) techniques to analyze the sentiment of user reviews in Indonesian. It includes feature extraction methods (TF-IDF and FastText), and multiple modeling approaches (SVM, Random Forest CNN, LSTM, and GRU). Hyperparameter tuning is performed using Optuna, and detailed evaluation and visualization of the results are provided.
This project is designed to explore various methods for performing sentiment analysis on Mobile Legends reviews. Key components include:
- Data Preparation & Feature Extraction:
- Preprocessing and cleaning of review texts.
- Feature extraction using TF-IDF and FastText.
- Utilization of a colloquial Indonesian lexicon and stopword list for enhanced text processing.
- Modeling Approaches:
- Traditional Machine Learning Models: SVM and Random Forest using scikit-learn.
- Deep Learning Models: CNN, LSTM, and GRU built with TensorFlow/Keras.
- Hyperparameter Optimization:
- Fine-tuning model parameters using Optuna.
- Visualization & Evaluation:
- Visualizing training and testing accuracy comparisons using Plotly Express and Matplotlib.
- Saving and reviewing model performance metrics and training histories.
.
├── README.md
├── datasets
│ ├── mlbb_reviews.csv
│ └── mlbb_reviews2.csv
├── lexicon
│ └── colloquial-indonesian-lexicon.csv
├── models
│ ├── cnn
│ │ ├── best_cnn_model.h5
│ │ └── cnn_optuna_study.pkl
│ ├── fasttext_model
│ │ ├── fasttext.kv
│ │ └── fasttext.kv.vectors.npy
│ ├── gru
│ │ ├── best_gru_model.h5
│ │ └── gru_optuna_study.pkl
│ ├── indobert_model
│ │ ├── config.json
│ │ ├── model.safetensors
│ │ ├── special_tokens_map.json
│ │ ├── tokenizer_config.json
│ │ └── vocab.txt
│ ├── lstm
│ │ ├── best_lstm_model.h5
│ │ └── lstm_optuna_study.pkl
│ ├── rf
│ │ ├── best_fasttext_rf_model.pkl
│ │ ├── best_tfidf_rf_model.pkl
│ │ ├── fasttext_rf_optuna_study.pkl
│ │ └── tfidf_rf_optuna_study.pkl
│ ├── svm
│ │ ├── best_fasttext_svm_model.pkl
│ │ ├── best_tfidf_svm_model.pkl
│ │ ├── fasttext_svm_optuna_study.pkl
│ │ └── tfidf_svm_optuna_study.pkl
│ └── tokenizer
│ └── tokenizer.pkl
├── notebooks
│ ├── sentiment_analysis.ipynb
│ └── sentiment_inference.ipynb
├── requirements.txt
├── results
├── scraper
│ └── playstore_scraper.ipynb
└── stoplist
└── stopwordbahasa.csv
- datasets/: Contains CSV files with Mobile Legends reviews.
- lexicon/: Contains a colloquial Indonesian lexicon used for text processing.
- models/: Saved models and Optuna study objects for CNN, LSTM, GRU, SVM, Random Forest, and the IndoBERT model.
- notebooks/: Jupyter notebooks for experimentation and analysis.
- results/: Visualization outputs (training history plots and accuracy comparisons).
- scraper/: Notebooks for scraping additional review data (e.g., from Play Store).
- stoplist/: Contains Indonesian stopword lists.
-
Clone the Repository:
git clone <repository_url> cd mobile-legends-sentiment-analysis
-
Create a Virtual Environment (Optional):
python -m venv venv source venv/bin/activate # On Linux/Mac venv\Scripts\activate # On Windows
-
Install Dependencies:
pip install -r requirements.txt
- Data Preparation:
- Place your datasets (e.g., mlbb_reviews.csv and mlbb_reviews2.csv) in the datasets/ folder.
- Ensure your dataset contains the required columns (e.g., content_clean and sentiment).
- Feature Extraction & Modeling:
- The project extracts features using TF-IDF and FastText.
- It then trains various models:
- Traditional ML Models: SVM and Random Forest are trained using both TF-IDF and FastText features.
- Deep Learning Models: CNN, LSTM, and GRU are trained with hyperparameter tuning using Optuna.
- To run experiments, open and execute the Jupyter notebook
notebooks/sentiment_analysis.ipynb
.
- Scraping Additional Data:
- Use the notebook
scraper/playstore_scraper.ipynb
to scrape additional reviews if needed.
- Use the notebook
- Results & Visualizations:
- Trained models and Optuna studies are saved under the respective folders in
models/
. - Visualizations (e.g., training history, accuracy comparisons) are saved in the
results/
folder. - The project also includes code to generate a grouped bar chart comparing training and testing accuracies using Plotly Express.
- Trained models and Optuna studies are saved under the respective folders in
This project is licensed under the MIT License. See the LICENSE file for details.