Skip to content

BDP25/BlueskyDashboard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BD Logo

BD – Bluesky Dashboard 📊☁️

A real-time sentiment & entity analysis dashboard built on top of Bluesky's decentralized feed using Big Data pipelines and NLP techniques.

Developed by Fabian Spühler & Jonne van Dijk
A student project at ZHAW – Zurich University of Applied Sciences


🚀 Overview

This dashboard analyzes and visualizes sentiment trends and named entities from the Bluesky social media platform in real time.
It leverages a full open-source tech stack including Apache Kafka, Spark, PostgreSQL/TimescaleDB and Transformer-based NLP models.

  • 🌐 Data Source: Jetstream (Bluesky firehose)
  • 🔄 Processing: Kafka + Python-based NLP workers
  • 🧠 Analysis: Sentiment classification, ABSA (Aspect-Based Sentiment Analysis), relation extraction
  • 📊 Frontend: Streamlit Dashboard with interactive visualizations

📦 Project Structure

📁 src/
 ├─ analytics/          → Data extraction scripts
 ├─ ingestion/          → Jetstream ingestion & Kafka producer
 ├─ processing/         → NLP workers (sentiment, ABSA, language detection)
 ├─ tools/              → Approaches for automatic entity list generation
 └─ visualization/      → Streamlit app
 
📁 data/
 └─ raw/                → Parquet and intermediate files
 
📁 init/
 └─ ⛃ init_timescale.sql   → Database Initialisation file

📁 docker/
 └─ 📄 docker-compose.yaml  → Full container setup main instance
 
📁 docker-worker/
 └─ 📄 docker-compose.yaml  → Full container setup worker instance
 
📁 config/        → environment files (master, worker)


🛠️ Setup & Usage

1. 📁 Clone the Repository

git clone https://github.com/BDP25/BlueskyDashboard.git
cd BlueskyDashboard

2. 📄 Environment Variables

All required .env files are already included in the repository. No additional setup is needed on the master instance.


⚡️ Optional Worker Instance (for additional compute power)

If you want to run heavy NLP workers (e.g., with GPU support) on a separate machine, follow these steps:

1. Adjust the Worker Environment File

Edit config/.env.worker and replace the default master IP (10.0.2.31) with the actual IP address of your master instance:

POSTGRES_HOST=192.168.0.10
KAFKA_BOOTSTRAP_SERVERS=192.168.0.10:9092

Ensure this IP is reachable from the worker machine.

2. Open Required Ports on the Master

On the master instance, ensure the following ports are open and accessible:

  • 9092 – Kafka (required if worker instance is used)
  • 5432 – PostgreSQL (required if worker instance is used)
  • 8501 – Streamlit dashboard (required if dashboard is used)

This may require router port forwarding or firewall adjustments depending on your network setup.

3. Start the Appropriate Docker Compose

On each instance, make sure you use the correct compose file:

🖥️ On the Master:

docker compose -f docker/docker-compose.yaml up -d

🚀 On the Worker:

docker compose -f docker-worker/docker-compose.yaml up -d

🔁 Scaling Worker Services

To increase throughput or parallelism, you can run multiple copies of any worker (e.g., nlp_absa, nlp_worker, etc.).

Just duplicate the service in your Compose file with a different name:

nlp_absa_2:
  <<: *nlp_absa

Each instance will independently consume messages from the same Kafka topic, allowing easy horizontal scaling across machines.


🐳 Docker Compose Compatibility

This project uses modern Docker Compose. Depending on your environment:

  • Use docker compose (preferred, Docker v20.10+, preinstalled on most systems)
  • Or docker-compose (legacy, still supported but no longer maintained)

Both work with the provided files. Adapt the commands accordingly.

📄 License

  • Code: GNU GPLv3 – Copyleft, fully open source
  • Visual/Content: CC BY 4.0 – Attribution required

👨‍🎓 Acknowledgements

This project was developed as part of a course assignment in the Big Data track at the
School of Engineering, ZHAW – supervised by Jonathan Fürst and Nico Spahni.

We especially thank Jonathan Fürst for his valuable technical support and feedback throughout the development process.

We also acknowledge the use of ChatGPT for architecture validation, debugging, and prototyping assistance.


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages