Customer reviews, especially on e-commerce platforms, can be lengthy and time-consuming to analyze manually. This project focuses on abstractive text summarization of customer reviews, aiming to generate concise summaries of long review texts automatically.
We leverage two approaches:
- Custom BiLSTM Encoder-Decoder Architecture: Built using TensorFlow/Keras for generating summaries.
- Pre-trained BERT Transformer (BART): Utilized through Hugging Faceβs Transformers library for comparison and enhanced performance.
The project uses the Amazon Fine Food Reviews dataset sourced from Kaggle.
- Understand the concept of text summarization.
- Perform thorough data cleaning and preprocessing.
- Implement abstractive summarization using deep learning architectures.
- Compare custom models against state-of-the-art transformers.
- Python
- TensorFlow / Keras
- Hugging Face Transformers
- Pandas, NumPy, Matplotlib, Seaborn (for data manipulation and visualization)
We have used the Amazon Fine Food Reviews dataset available on Kaggle.
This dataset contains 500,000+ food reviews including text, ratings, and summary fields.
-
Data Collection:
Downloaded using Kaggle API. -
Data Cleaning:
- Removed duplicates and NaN values.
- Converted all text to lowercase.
- Removed HTML tags, special characters, numbers, and text within parentheses.
- Added special tokens (
sostok
,eostok
) to summaries.
-
Text Preprocessing:
- Tokenization.
- Sequence padding.
- Train-validation split.
-
Model Building:
- BiLSTM Encoder: Processes input review text.
- Decoder with LSTM: Predicts the summary sequence.
- Embedding Layer: Learned word representations.
- Early Stopping: To avoid overfitting.
-
Training:
- Trained the BiLSTM model for 20 epochs with validation monitoring.
-
Inference:
- Built separate encoder and decoder models for generating summaries.
-
Transformer Summarization:
- Used BART (facebook/bart-large-cnn) from Hugging Face for benchmark summarization.
- Input: "I recently purchased this organic green tea, and I must say, I am thoroughly impressed..."
- Generated Summary (BiLSTM Model):
great tea
- Generated Summary (BART Model): "I drink this tea every morning, and it gives me a calming start to my day..."
- Vaishnav Naik
- Yashaurya Soni
- Piyush Borakhade
This project is for academic purposes only.