Skip to content

Infantdebil/Amazon_Review_TOP_Content_Generation

Repository files navigation

Robo Review to Marketing Affiliate Approach

Overview

This project focuses on clustering product categories and generating meaningful insights using unsupervised learning techniques, semantic embeddings, and modern language models. The ultimate goal is to assist in developing a marketing affiliate approach by grouping similar products, analyzing customer sentiment, and providing actionable summaries for recommendations.

The project performs three main tasks:

  1. Sentiment Analysis of product reviews using VADER to classify reviews as positive, neutral, or negative.
  2. Product Category Clustering with K-Means to group products into 4 meaningful categories.
  3. Summarization of customer feedback and cluster-based recommendations using Cohere's large language model (LLM).

Key Features

  • Sentiment Analysis:
    • Used VADER to classify customer reviews into positive, neutral, or negative sentiments.
  • Clustering:
    • Applied K-Means on embeddings generated by Google’s Universal Sentence Encoder (USE) to cluster product categories into 4 groups:
      • Tablets
      • Smart Home Devices
      • E-Readers
      • Others
  • Summarization:
    • Leveraged Cohere's LLM to generate summaries and actionable recommendations for each product category.

Technologies Used

  • Libraries:
    • pandas: For data manipulation.
    • scikit-learn: For K-Means clustering.
    • TensorFlow & TensorFlow Hub: For Universal Sentence Encoder embeddings.
    • Matplotlib: For data visualization.
    • Cohere API: For LLM-based summarization.
    • VADER: For sentiment analysis.
  • Pretrained Models:
    • Google’s Universal Sentence Encoder (USE): To generate semantic embeddings for clustering.
    • Cohere's LLM: For generating summaries.

Dataset

  • Source: Kaggle
  • Description:
    • Approximately 34,000 product entries with columns like:
      • name: Product names.
      • categories: Product categories.
      • reviews.text: Customer reviews.

Workflow

1. Sentiment Analysis

  • Preprocessed the reviews.text column by:
    • Cleaning the text (removing special characters, stopwords).
    • Feeding the cleaned text into VADER.
  • Output: A new column sentiment indicating positive, neutral, or negative reviews.

2. Product Category Clustering

  • Preprocessed name and categories by combining them into a single column (name_and_category).
  • Generated semantic embeddings using Universal Sentence Encoder (USE).
  • Applied K-Means clustering to group products into 4 clusters:
    1. Tablets
    2. Smart Home Devices
    3. E-Readers
    4. Others
  • Validated the clusters using PCA visualization and word frequency analysis.

3. Summarization

  • Used Cohere’s LLM to generate summaries for each cluster by feeding:
    • Cluster-specific product names, reviews, and categories.
  • Output: Summarized recommendations and marketing content highlighting key products for each category.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published