- The project has two parts, 1) Data collection from political subreddits on Reddit and 2)Data analysis (Topic analysis, Sentiment and Emotion Analysis) and visualization.
python
- The project is developed and tested using python v3.9.7. Python Websitetime
- Time is a module that provides various time-related functions Python Websitejson
- Json enables users to work with a lightweight data interchange format JSON (JavaScript Object Notation) Python Websiterequests
- Requests is an HTTP library for Python. Requests Websitepymongo
- PyMongo is a Python distribution containing tools for working with MongoDB Pymongo WebsiteJupyter
- Jupyter is an open-source IDE and web application that you can use to create and share documents that contain live code, equations, visualizations, and text JupyterMongoDB Community Server
- It offers a flexible document data model along with support for ad-hoc queries, secondary indexing, and real-time aggregations to provide powerful ways to access and analyze your data. It also contains MongoDB Compass which is a GUI for MongoDB and an interactive tool for querying, optimizing, and analyzing your MongoDB data. The project's reddit's database and collection were stored locally in MongoDB Compass. Download link: MongoDB Community Serverurllib
- urllib is a package that collects several modules for working with URLs Python Websitecollections
- This module implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers Python Websitedatetime
- The datetime module supplies classes for manipulating dates and times. Python Websitedateutil
- The dateutil module provides powerful extensions to the standard datetime module, available in Python. dateutil Websitepytz
- pytz brings the Olson tz database into Python. Python Websiteitertools
- This module contains functions creating iterators for efficient looping Python Websitenumpy
- The fundamental package for scientific computing with Python numpy WebsiteNRCLex
- An affect generator based on TextBlob and the NRC affect lexicon NRCLex WebsitevaderSentiment
- VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. vaderSentiment Websitebokeh
- Bokeh is a Python library for creating interactive visualizations for modern web browsers. bokeh Websitetensorflow
- An end-to-end machine learning platform vaderSentiment Websitekeras
- Keras is a high-level, deep learning API developed by Google for implementing neural networks. keras Website
Reddit
- We are usingr/politics
,r/news
,r/democrats
,r/Republican
,r/conservative
,r/worldnews
,r/moderatepolitics
,r/NeutralPolitics
,r/progressive
,r/PoliticalDiscussion
,r/uspolitics
,r/politics2
,r/AmericanPolitics
,r/Liberal
,r/Republicans
,r/conservatives
,r/StateOfTheUnion
.- r/politics - News and discussion about U.S. politics
- r/news - News articles about current events in the United States and the rest of the world
- r/democrats - The Democratic Party's daily news updates, policy analysis, links, and opportunities to participate in the political process
- r/Republican - Partisan subreddit for Republicans to discuss issues with other Republicans
- r/conservative - Subreddit for conservatives, both fiscal and social, to read and discuss political and cultural issues from a distinctly conservative point of view
- r/worldnews - A place for major news from around the world, excluding US-internal news.
- r/moderatepolitics - Political subreddit for moderately expressed opinions and civil discourse
- r/NeutralPolitics - A heavily moderated community dedicated to respectful, empirical discussion of political issues
- r/progressive - A community to share stories related to the growing Modern Political and Social Progressive Movement.
- r/PoliticalDiscussion - Subreddit for discussion about politics.
- r/uspolitics - A subreddit for US Politics.
- r/politics2 - place to discuss the American political process, the political parties, the politicians, political news and candidates.
- r/AmericanPolitics - A place to discuss the American political process, parties, the politicians and topics
- r/Liberal - A subreddit to discuss Liberal ideas including politics
- r/Republicans - Pro-Republican subreddit to discuss politics
- r/conservatives - Subreddit to discuss political ideas based on Conservatism
- r/StateOfTheUnion - A subreddit to discuss current political agendas and topics
- The data contains posts and comments of depth 1. The posts have been sorted based on the following
- /new - New posts on subreddits
- /hot - Posts gaining upvotes/comments on a rapid base
- /rising - Newly submitted posts that's rapidly getting engagements
- /top/?t=day - Top posts of the day
- /top/?t=week - Top posts of the week
- Reddit API documentation
- Reddit API archive
System Architecture for Reddit
Install Python
, and MongoDB Community Server
to access MongoDB Compass
python3 -m pip install requests
python3 -m pip install pymongo
pip install jupyterlab
pip install notebook
Launch Jupyter Notebook using command prompt or the installed Jupyter Notebook App
jupyter notebook
-
Create a Reddit account and then create an application to get application id and secret Ouath2
-
The textfile logininfo.txt contains login information of user
Reddit username
,password
,application name
,ápplication id
,secret
each on new line in same order- The textfile subreddit.txt contains names of subreddits with one name on each new line
- The textfile sorting.txt contains sorting parameter for posts on subreddit with one type on each new line
-
Open the redditPart1.ipynb file
-
For local MongoDB database, the the second cell must have
client = pymongo.MongoClient("mongodb://localhost:27017/")
or your mongoDB localhost link -
Run the redditPart1.ipynb jupyter notebook
Installation of necessary libraries to run the project. Run the folliwing in OS command prompt/terminal or Anaconda prompt or run the code in Jupyter notebook
pip install NRCLex, vaderSentiment, bokeh, numpy, python-dateutil, pytz
pip install tensorflow
Launch Jupyter Notebook using command prompt or the installed Jupyter Notebook App
jupyter notebook
- Install the libraries and modules mentioned above
- Include the collected JSON data for posts ('postCollection.json') and comments ('commentCollection.json') in the same folder as the python notebook
- Open Jupyter Notebook App and locate the directory containing the python notebook and datasets
- Open the redditFinal.ipynb file
- Run the redditFinal.ipynb jupyter notebook
- The bokeh plotted graphs including an interactive graph with a dropdown for selection, will be displayed on new tabs.
- Some more graphs and text outputs will be displayed throughout the python notebook cells (NOTE) Certain blocks of codes such as topic analysis on comments or emotion analysis of posts can take several minutes to execute and Emotion analysis of comments can take several minutes to couple of hours depending on the CPU specifications.
collection_2: redditDB.postCollection
{
"_id": ObjectID,
"subreddit": String,
"postid": String,
"created_utc": Double,
"title": String,
"selftext": String,
"permalink": String,
"num_comments": Int32,
"upvotes": Int32,
"upvote_ratio": Double,
"url": String
}
collection_3: redditDB.commentCollection
{
"_id": ObjectID,
"subreddit": String,
"postid": String,
"comment_id": String,
"comment_text": String,
"created_utc": Double
}
The reddit project has a database with two collections, one each for posts and comments