Text Analytic using Text Mining
- Prepare text data through data cleaning for preprocessing.
- Determine the occurrence frequency of terms (words or phrases) in movie reviews using Document-Term Matrix (DTM).
- Standardize text data into a consistent format suitable for analysis.
- Conduct text analysis to assess term frequency, term reduction, and term correlation.
- Identify the most frequent words in the movie reviews.
Theres a several key insign emerged:
- Text preprocessing involved removing irrelevant numbers and punctuation, converting words to lowercase, and eliminating stopwords.
- Through the use of Document-Term Matrix (DTM), several words with the highest frequency were identified, enabling the exploration of relationships and patterns among them.
- Stemming and lemmatization were employed to streamline the words in the text.
- Term frequency reveals how often words appear, term reduction simplifies the text, and term correlation indicates word relationships, enhancing our understanding and analysis of text data.
- The histogram highlights that words such as "film," "movie," and "good" have the highest frequency, suggesting positive movie reviews.
- Histogram: Displays the most frequent numerical values found in movie reviews.
- Wordcloud: Illustrates the frequency of words in the reviews, visually emphasizing the most common ones.
- Dendrogram: Reveals word relationships by visualizing clustering outcomes, aiding in identifying patterns within the data.