Text as Data: Social Network Analysis & Natural Language Processing

Graves’ Disease Reddit Community Exploration

📌 Project Overview

This academic project explores the online interactions and shared experiences of users in the Reddit community focused on Graves’ disease — an autoimmune thyroid condition. Using Social Network Analysis (SNA) and Natural Language Processing (NLP) techniques, the study uncovers communication patterns, influential users, key discussion topics, and sentiment trends within the community.

🗂 Full project report: 📄 Text As Data.pdf

🧰 Tools & Technologies

Category	Tools / Libraries
Data Collection	Reddit API, PRAW
SNA & Graphs	NetworkX, iGraph, Gephi
NLP & Text Mining	spaCy, NLTK, Scikit-learn, Gensim
Visualization	Matplotlib, WordCloud, custom butterfly-shaped plots
Topic Modeling	TF-IDF, LDA, Coherence Score, Perplexity
Sentiment Analysis	VADER

🧠 Objectives

Detect influential users and communication flows within the Reddit community.
Identify central users and community structures using SNA metrics.
Analyze emotional tone and shared concerns using NLP techniques.
Surface common keywords and latent discussion topics.
Visualize sentiment-driven word clouds using a custom butterfly (Thyroid) shape.

🔗 Main Components

🔹 1. Data Collection & Preparation

Fetched top 30 posts and all comments from /r/gravesdisease using the Reddit API.
Saved as structured CSV files for posts, comments, nodes, and edges.
Labeled users as posters or commenters, and constructed interaction graphs.

🔹 2. Social Network Analysis

Built directed graphs using NetworkX & iGraph.
Computed:
- Degree Centrality
- Betweenness Centrality
- Harmonic Closeness Centrality
Detected cut vertices and critical bridges in communication.
Identified strong & weak connected components.

🖼 Initial Network visualization by Gephi

🖼 Modularity Maximization Tracking

🔹 3. Community Detection

Used the Girvan-Newman algorithm to detect communities.
Visualized results with circular layouts and modularity tracking.
Interpreted central vs. peripheral communities, inter-community ties, and influence hubs.

🖼 Before Labeling Communities

🖼 Community Layout (Circular Structure)

🔹 4. Natural Language Processing

Text Preprocessing

Emoji-to-text, lowercasing, regex cleaning, stopword removal, lemmatization.
Used spaCy, NLTK, and regex for fine-grained cleaning.

Sentiment Analysis

Used VADER to assign polarity scores to each post/comment.
Classified as: Positive, Negative, or Neutral.

Keyword Extraction

Extracted most frequent terms using CountVectorizer.

Topic Modeling

Performed LDA with TF-IDF to uncover dominant themes.
Selected optimal number of topics using Coherence Score and Perplexity.

🖼 Topic Modeling Evaluation - Coherence vs. Perplexity

🔹 5. Word Cloud Visualization

Designed a custom butterfly-shaped mask to generate three word clouds:
- 🟢 Right wing → Positive sentiment
- 🔴 Left wing → Negative sentiment
- 🔵 Body → Neutral sentiment

🖼 Sentiment Word Cloud (Butterfly Mask)

This project demonstrates the combination of social graph theory, natural language processing, and visual storytelling to explore real-world online health communities.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
csv_files_gravesdisease		csv_files_gravesdisease
media		media
.gitignore		.gitignore
Connectivity check.py		Connectivity check.py
Data retrieval.py		Data retrieval.py
README.md		README.md
Text-As-Data-Report.pdf		Text-As-Data-Report.pdf
centrality measures.py		centrality measures.py
community Girvan-Newman.py		community Girvan-Newman.py
connections types and cut vertices.py		connections types and cut vertices.py
keywords.py		keywords.py
modularity.py		modularity.py
nodes and edges.py		nodes and edges.py
preprocess text.py		preprocess text.py
requirements.txt		requirements.txt
sentiment analysis.py		sentiment analysis.py
topic modeling.py		topic modeling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Text as Data: Social Network Analysis & Natural Language Processing

Graves’ Disease Reddit Community Exploration

📌 Project Overview

🧰 Tools & Technologies

🧠 Objectives