Simiarity and Clustering

This repository contains code for analyzing restaurant reviews using TF-IDF, K-means clustering, and LDA topic modeling. This repository contains two Jupyter notebook with some repetitive contents due to several trials, one has appliced Latent Dirichlet Allocation (LDA) to topic modeling, extracting key themes from restaurant reviews for both Chinese and Japanese cuisines. The PDF file is also in this repository

This trails aims to analyze reviews of Chinese and Japanese restaurants on food-related themes.

Elbow method was used to determine the optimal number of clusters (k) for k-means.

Data

The review dataset contains customer reviews for Chinese and Japanese restaurants, including text data and metadata like review scores, cuisine type, and other relevant features. The dataset is large and includes millions of records.

Due to the large size of the data, it could not be uploaded directly to this repository. However, you can download the data from the source (or request access) and place it in the appropriate directory for processing. The dataset is expected to be stored in CSV or JSON format and should be pre-processed (e.g., cleaned, tokenized) before running the analysis.

For tf-idf

TF-IDF seems to be a bit meaningless for my data as it The methodology presented by Gabriela Nathania H. et al. focuses on summarizing hotel reviews using two main approaches: extractive summarization based on TF-IDF scores and feature extraction through Adjective-Noun Pairing, which I could try but failed.

The clustering approach could be used to group reviews into similar topics or sentiments.

Future thoughts: Adjective-Noun Pairing: Extracting adjectives and nouns can help identify the key features and their qualities that customers comment on. For example, pairing "delicious" with "ramen" or "slow" with "service" could help in understanding specific aspects of customer satisfaction or dissatisfaction?

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
Similarity_Cluster.pdf		Similarity_Cluster.pdf
Untitled Diagram.drawio		Untitled Diagram.drawio
cluster_kmeans_topic modeling.ipynb		cluster_kmeans_topic modeling.ipynb
tfidf_similarity.ipynb		tfidf_similarity.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Simiarity and Clustering

Data

For tf-idf

About

Uh oh!

Releases

Packages

Languages

yuy123337/ggreviewclustering

Folders and files

Latest commit

History

Repository files navigation

Simiarity and Clustering

Data

For tf-idf

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages