This project implements a sentiment analysis model using a Hybrid CNN-RNN architecture with an Attention Mechanism for improved performance. The model is trained on the IMDb movie reviews dataset and leverages pre-trained GloVe embeddings for semantic understanding.
The dataset repository for this project is hosted on GitHub and can be accessed via the following link:
This repository contains all the code, dataset files, and results necessary to replicate the project and analyze the performance of various models.
- Python 3.8+
- Jupyter Notebook
- Required Python Libraries (Install via
requirements.txt
):pip install -r requirements.txt
-
Clone the repository:
git clone https://github.com/T2LIPthedeveloper/50.040-NLP-Final-Project cd 50.040-NLP-Final-Project
-
Install required libraries:
pip install -r requirements.txt
-
Open the Python environment and run the following:
python sentencesenseis_script.py
The results for the sentiment analysis task using various models are summarized below:
- Accuracy: 0.9472
- Precision: 0.9464
- Recall: 0.9480
- F1 Score: 0.9472
- Accuracy: 0.8476
- Precision: 0.8132
- Recall: 0.9026
- F1 Score: 0.8556
- Accuracy: 0.8320
- Precision: 0.9116
- Recall: 0.7352
- F1 Score: 0.8140
For detailed performance metrics, refer to the result files:
- Results CSV
- Prediction Results (Hybrid-CNN-RNN)
- Prediction Results (TextCNN)
- Prediction Results (BiRNN)
This project was developed as part of the 50.040 Natural Language Processing course at SUTD. The group members are:
- Ansh Oswal (1006265)
- Atul Parida (1006184)
- Elvern Neylman Tanny (1006203)
We would like to thank the following:
- Stanford AI Group for providing the IMDb dataset.
- SUTD and Professor XX for their guidance and resources for the 50.040 NLP course.