Skip to content

Whelm provides creators with insights into audience perception and sentiment by analyzing YouTube comments. It processes these comments to identify patterns and trends, helping creators understand their viewers better and improve their content.

Notifications You must be signed in to change notification settings

AnirudhSinghBhadauria/whelm

Repository files navigation

Why Whelm?

Just for a second assume you're a content creator on YouTube (I know most of you are). Don't you just think, there should be a tool which should give me each and every detail about my new videos like how audience perceived it, what they want, feedbacks from them, the overall sentiment in audience so that I can better my content?

Imagine uploading a video that you spent days creating - scripting, filming, editing - only to face the daunting task of manually sifting through hundreds or thousands of comments to understand what worked and what didn't. You're left wondering: Did my audience actually like this content? What specific aspects resonated with them? What should I change for my next video?

Content creation shouldn't be guesswork. You shouldn't have to rely on basic metrics like views and likes to determine if your content strategy is working. What if you could have a personal analyst that processes every comment, extracts meaningful insights, and delivers clear recommendations directly to you?

That's exactly what Whelm does! Whelm is an intelligent analytics system designed to help content creators understand audience perception and improve their content strategy. By automating the collection and analysis of YouTube comments. Processing comments from videos published within the last week to offer timely feedback.

How Whelm Works?

Whelm works like your dedicated research team, constantly monitoring your YouTube presence. Every day, it collects fresh comments from your recently published videos. These comments pass through intelligent processing that understands language nuances beyond simple keywords.

Placeholder Image

Whelm reads between the lines to identify sentiment, extracting how viewers truly feel about your content. The system then transforms this raw feedback into clear insights and actionable recommendations. All this happens automatically in the background while you focus on creating your next masterpiece.

Behind the Scenes

Whelm operates through a sophisticated six-stage pipeline that turns viewer comments into creator gold.

  • The journey begins with fetching recent comments from your videos using YouTube's API.
  • These comments are stored securely before undergoing preprocessing to clean and prepare them for analysis.
  • The RoBERTa model then evaluates each comment's emotional tone, categorizing them as positive, negative, or neutral.
Placeholder Image

The system moves your data through a structured path from raw comments to processed insights, ensuring nothing gets lost along the way. Airflow orchestrates this entire workflow while Docker keeps everything running smoothly regardless of your setup.

Technical Architecture

Our data collection layer connects directly to the YouTube Data API with intelligent polling mechanisms. These connectors respect rate limits while maximizing data throughput to ensure comprehensive comment capture from all your videos without missing engagement.

Placeholder Image

At the heart of Whelm sits our NLP core. Unlike general-purpose sentiment tools, our model understand YouTube-specific language patterns, including abbreviated speech, emojis, and platform-specific references that traditional systems miss.

Technology Stack

  • Orchestration: Apache Airflow
  • Storage: MinIO (S3-compatible object storage)
  • Databases: CockroachDB and PostgreSQL
  • Processing: PySpark for data transformations
  • ML Models:
    • RoBERTa for sentiment analysis
    • Mistral AI for natural language summary generation
  • Containerization: Docker

MinIO Bucket Structure

├── stage       # Raw data
├── preprocessed # Cleaned data
├── curated     # Data with sentiment scores
├── processed   # Data with summaries
├── dump        # Archived data
└── transcript  # Summarized insights stay here 

Installation

Getting started with Whelm takes just a few simple steps.

Prerequisites

  • Docker and Docker Compose: Install from here
  • Install Astro: Install the Astro CLI from here
  • YouTube Data API credentials: Get yours here
  • Mistral AI API key: Get yours here
  • MinIO instance: Using the docker over-ride file in the repository.
  • PostgreSQL instance: You can spin it locally using the docker-compose or you can get your cloud instance using Render.
  • CockroachDB instance: Again you can spin it locally using the docker-compose or you can get your cloud instance here.
  • Read docker-compose.override.yml: By this you get to know how everything is setup, how you can access different applications like MinIO, Airflow Server, change credentials for them etc.
  • Folder Stucture: Please find the folder structure here.
  • IDE: Anything works, VSC, PyCharm etc.
Placeholder Image

Applications & Credentials

Name URL Username & Password
Airflow Webserver http://localhost:7081/home admin & admin
Spark Master http://localhost:8081/ -
MinIO http://localhost:9001/login minio & minio123
  1. Clone the repository:
git clone git@github.com:AnirudhSinghBhadauria/whelm.git
cd whelm
  1. Start your Astro Instance:
astro dev start
  1. Open Airflow webserver: Go to Admin -> Variables
  • Create a key called 'channel_ids' and save all the channel keys you want to process. Value should be in this format only ["channel_id_1", "channel_id_2", ...]
  • Create a key called 'cockroach_connection' and the values should look like this
{
  "driver": "org.postgresql.Driver",
  "url": "YOUR_COCKROACHDB_URL",
  "user": "YOUR_USERNAME",
  "password": "YOUR_PASSWORD",
  "ssl": "false"
}
  • Create a key called 'minio_bucket' with value "whelm".
  • Create a key called 'yt_developer_key' with your YT developer key.
  • Create a key called 'mistral_key' with your Mistal key.
  • Create a key called "postgres_connection" with your Render Postgres key if you are using cloud version using Render.
  1. Run your DAG:
astro run

Now you can find all the insights and summary of your video in both the warehouses, in MinIO bucket in the /transcript folder. Enjoy!

Simple, Reliable Data Preservation

Whelm takes the long view with your audience insights. We securely store all your valuable comment data and analysis results for as long as you need them, ensuring your historical engagement patterns remain accessible and actionable at any time.

Profile Image 1 Profile Image 2

What makes our approach special is our parallel processing architecture. Whelm simultaneously loads information to multiple systems, completely eliminating traditional data bottlenecks when processing large volumes of comments. This means faster analysis, quicker insights, and more responsive performance even at scale.

What Creators Get?

Creators receive the audience understanding they've always wanted but never had time to develop. You'll discover recurring themes in comments that highlight what resonated most with your audience. The sentiment breakdown shows you exactly how positive or negative the reception was, with specific examples from actual comments.

Beyond simple metrics, you receive strategic recommendations tailored to your content style and audience preferences. All data remains accessible in structured databases, allowing you to track audience sentiment trends over time and across different video styles. This means each new video can be more targeted and effective than the last.

Notes

  • We have kept DAG timeout at 30 minutes. However, You may need to increase it according to your usage.
  • You can spin both Postgres and CockroachDB locally instead of cloud using the same setup.
  • You can change the credentials for the respective application in the docker-compose.override.yml file.

With Whelm, you'll create content that truly resonates with your viewers. If Whelm helps your content journey, please ⭐ the repo! Your stars help more creators discover these insights and contribute to making the tool even better. Get started today!

Find Me Around The Web

About

Whelm provides creators with insights into audience perception and sentiment by analyzing YouTube comments. It processes these comments to identify patterns and trends, helping creators understand their viewers better and improve their content.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published