This project involves using LDA topic modeling to discover the underlying thematic structure in the document collections, and then automatically classify new, incoming news articles (that I web scraped) into an existing category.
First, I trained a topic model on 20,000 news articles using an archived dataset. Next, I webscraped 400 CNBC news articles. I cleaned the text using a variety of R functions. I then categorized each news article using my topic model.