Skip to content

Graves' Disease Reddit Community Analysis An academic project employing Social Network Analysis (SNA) and Natural Language Processing (NLP) to explore user interactions and discussions within the Reddit community focused on Graves' disease.

Notifications You must be signed in to change notification settings

AviachenCohen/NLP-SNA-Project

Repository files navigation

Text as Data: Social Network Analysis & Natural Language Processing

Graves’ Disease Reddit Community Exploration

📌 Project Overview

This academic project explores the online interactions and shared experiences of users in the Reddit community focused on Graves’ disease — an autoimmune thyroid condition. Using Social Network Analysis (SNA) and Natural Language Processing (NLP) techniques, the study uncovers communication patterns, influential users, key discussion topics, and sentiment trends within the community.

🗂 Full project report: 📄 Text As Data.pdf


🧰 Tools & Technologies

Category Tools / Libraries
Data Collection Reddit API, PRAW
SNA & Graphs NetworkX, iGraph, Gephi
NLP & Text Mining spaCy, NLTK, Scikit-learn, Gensim
Visualization Matplotlib, WordCloud, custom butterfly-shaped plots
Topic Modeling TF-IDF, LDA, Coherence Score, Perplexity
Sentiment Analysis VADER

🧠 Objectives

  • Detect influential users and communication flows within the Reddit community.
  • Identify central users and community structures using SNA metrics.
  • Analyze emotional tone and shared concerns using NLP techniques.
  • Surface common keywords and latent discussion topics.
  • Visualize sentiment-driven word clouds using a custom butterfly (Thyroid) shape.

🔗 Main Components

🔹 1. Data Collection & Preparation

  • Fetched top 30 posts and all comments from /r/gravesdisease using the Reddit API.
  • Saved as structured CSV files for posts, comments, nodes, and edges.
  • Labeled users as posters or commenters, and constructed interaction graphs.

🔹 2. Social Network Analysis

  • Built directed graphs using NetworkX & iGraph.
  • Computed:
    • Degree Centrality
    • Betweenness Centrality
    • Harmonic Closeness Centrality
  • Detected cut vertices and critical bridges in communication.
  • Identified strong & weak connected components.

🖼 Initial Network visualization by Gephi
Unlabeled Graph

🖼 Modularity Maximization Tracking
Modularity line graph

🔹 3. Community Detection

  • Used the Girvan-Newman algorithm to detect communities.
  • Visualized results with circular layouts and modularity tracking.
  • Interpreted central vs. peripheral communities, inter-community ties, and influence hubs.

🖼 Before Labeling Communities
Unlabeled Graph

🖼 Community Layout (Circular Structure)
Circular communities


🔹 4. Natural Language Processing

Text Preprocessing

  • Emoji-to-text, lowercasing, regex cleaning, stopword removal, lemmatization.
  • Used spaCy, NLTK, and regex for fine-grained cleaning.

Sentiment Analysis

  • Used VADER to assign polarity scores to each post/comment.
  • Classified as: Positive, Negative, or Neutral.

Sentiment Chart

Keyword Extraction

  • Extracted most frequent terms using CountVectorizer.

Topic Modeling

  • Performed LDA with TF-IDF to uncover dominant themes.
  • Selected optimal number of topics using Coherence Score and Perplexity.

🖼 Topic Modeling Evaluation - Coherence vs. Perplexity
Dual Axis

🔹 5. Word Cloud Visualization

  • Designed a custom butterfly-shaped mask to generate three word clouds:
    • 🟢 Right wing → Positive sentiment
    • 🔴 Left wing → Negative sentiment
    • 🔵 Body → Neutral sentiment

🖼 Sentiment Word Cloud (Butterfly Mask)
WordCloud

This project demonstrates the combination of social graph theory, natural language processing, and visual storytelling to explore real-world online health communities.

About

Graves' Disease Reddit Community Analysis An academic project employing Social Network Analysis (SNA) and Natural Language Processing (NLP) to explore user interactions and discussions within the Reddit community focused on Graves' disease.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages