This project presents a comprehensive analysis of audience engagement and sentiment through YouTube comments on Kendrick Lamar's "Luther" music video. The analysis focuses on understanding viewer reactions, engagement patterns, and sentiment trends to provide insights into audience reception and potential areas for improvement.
YouTube comments provide valuable insights into how audiences receive and engage with music content. This project analyzes comment data from Kendrick Lamar's "Luther" music video to understand audience sentiment, engagement patterns, and key discussion topics, helping to understand the impact and reception of this specific release.
- Initial audience reaction
- Lyrical interpretation discussions
- Cultural impact analysis
- Fan engagement patterns
- Sentiment evolution over time
Our analysis framework focuses on YouTube comments data from the videos around the song "Luther" which are categorized as:
- Official Videos
- Repost Videos
- Reaction Videos
- Analysis Videos
- Comments, replies text and metadata
- User engagement metrics (likes, replies)
- Timestamp data
The project is structured as follows:
music-social-network-analysis/
├── src/ # Source code
│ ├── collectors/ # YouTube data collection modules
│ ├── processors/ # Comment preprocessing
│ ├── analyzers/ # Sentiment, topic and network analysis
│ ├── notebooks/ # Analysis notebooks
│ └── data/ # Processed comment data
├── images/ # Generated visualizations
├── requirements.txt # Project dependencies
└── .python-version # Python version specification
- Python 3.x (see .python-version for specific version)
- pip (Python package manager)
- Git
- YouTube Data API credentials
Two distinct topic modeling methods were employed:
-
BERTopic, a modern transformer-based technique, was used to extract 20 granular topics by leveraging BERT embeddings and class-based TF-IDF scores.
-
LDA, a probabilistic model based on word co-occurrence, was used to extract 7 broader thematic clusters for comparison.
This section explores sentiment analysis using two distinct approaches: VADER (Valence Aware Dictionary and sEntiment Reasoner) and pre-trained RoBERTa.
Both models generally concurred on the overall trend: "neutral" comments were the most frequent, followed by "positive," and then "negative." RoBERTa appeared to identify a greater number of neutral comments than VADER while VADER identified slightly more positive comments than RoBERTa The counts for negative sentiment were relatively similar between the two models. For the actual results, it can be seen that not only RoBERTa captures the sentiment of emoji much better than VADER but also due to being pre trained with millions of tweets the vocabulary of RoBERTa is more sentimentally precise.
Across all analyzed dimensions—individual topics, different video categories, and temporal trends—a recurring pattern emerged: neutral sentiment generally holds a significant, often leading, share, closely followed by positive sentiment. Negative sentiment, while consistently identifiable, typically constitutes the smallest proportion of comments. This broad observation suggests a largely favorable or at least non-negative engagement with the song.
Gather comments from relevant music videos using APIs (e.g., YouTube Data API). Consider diversifying sources with platforms like TikTok or SoundCloud. Define "relevant videos" carefully, including official, lyric, and fan-made content.
Clean raw text by removing punctuation, normalizing case, and handling emojis. Tokenize text into words, remove stopwords (common, uninformative words), and apply lemmatization (reducing words to their base form) for consistent analysis. Python libraries like NLTK and spaCy are essential here.
Assess the emotional tone of comments. Use pre-trained models like VADER for social media nuances or advanced Transformer-based models (e.g., BERT via Hugging Face) for higher accuracy. Fine-tuning these models on music-specific data can enhance performance.
Identify key themes and subjects discussed in comments. LDA (Latent Dirichlet Allocation) can uncover general topics, while BERTopic, leveraging BERT embeddings, often yields more coherent and interpretable themes without needing a predefined number of topics.
Map conversational interactions to understand influence and discussion flow. Build reply graphs where comments are nodes and replies are edges. Compute centrality metrics (e.g., degree, betweenness, eigenvector) using libraries like NetworkX to identify influential comments or users.
Present findings clearly through various visuals: Sentiment Timelines show how audience emotion evolves over time. Word Clouds highlight frequently used words within specific themes or sentiment categories. Network Diagrams illustrate reply structures, with nodes colored by sentiment and sized by centrality to reveal key discussion points. Topic Distribution Charts and Sentiment Charts provide deeper insights into thematic and emotional patterns.
Assignment 3 of COSC2049 - Social Network and Media Analysis RMIT 2025
Full report: drive