Skip to content

zzhangusf/NLP-projects

Repository files navigation

NLP Projects

Project Description

  • TFIDF for Reuters Articles in XML

    • Extracted titles and paragraphs from Reuters articles in the XML format using ElementTree
    • Tokenized and stemmed texts with NLTK, and determined TFIDF of the most common words using TfidfVectorizer
  • Sentiment Analysis with Naive Bayes using PySpark

    • Performed data cleaning and transformation, and estimate TF using PySpark RDD
    • Built a Naive Bayes model to perform sentiment analysis and achieved an accuracy of 82.5%
  • Sentiment Analysis of Tweets

    • Parsed and stemmed tweet texts, and determined TF of the most common words
    • Classified the tweet sentiment using the regularized logistic regression, LDA, and KNN

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published