BD – Bluesky Dashboard 📊☁️

A real-time sentiment & entity analysis dashboard built on top of Bluesky's decentralized feed using Big Data pipelines and NLP techniques.

Developed by Fabian Spühler & Jonne van Dijk
A student project at ZHAW – Zurich University of Applied Sciences

🚀 Overview

This dashboard analyzes and visualizes sentiment trends and named entities from the Bluesky social media platform in real time.
It leverages a full open-source tech stack including Apache Kafka, Spark, PostgreSQL/TimescaleDB and Transformer-based NLP models.

🌐 Data Source: Jetstream (Bluesky firehose)
🔄 Processing: Kafka + Python-based NLP workers
🧠 Analysis: Sentiment classification, ABSA (Aspect-Based Sentiment Analysis), relation extraction
📊 Frontend: Streamlit Dashboard with interactive visualizations

📦 Project Structure

📁 src/
 ├─ analytics/          → Data extraction scripts
 ├─ ingestion/          → Jetstream ingestion & Kafka producer
 ├─ processing/         → NLP workers (sentiment, ABSA, language detection)
 ├─ tools/              → Approaches for automatic entity list generation
 └─ visualization/      → Streamlit app
 
📁 data/
 └─ raw/                → Parquet and intermediate files
 
📁 init/
 └─ ⛃ init_timescale.sql   → Database Initialisation file

📁 docker/
 └─ 📄 docker-compose.yaml  → Full container setup main instance
 
📁 docker-worker/
 └─ 📄 docker-compose.yaml  → Full container setup worker instance
 
📁 config/        → environment files (master, worker)

🛠️ Setup & Usage

1. 📁 Clone the Repository

git clone https://github.com/BDP25/BlueskyDashboard.git
cd BlueskyDashboard

2. 📄 Environment Variables

All required .env files are already included in the repository. No additional setup is needed on the master instance.

⚡️ Optional Worker Instance (for additional compute power)

If you want to run heavy NLP workers (e.g., with GPU support) on a separate machine, follow these steps:

1. Adjust the Worker Environment File

Edit config/.env.worker and replace the default master IP (10.0.2.31) with the actual IP address of your master instance:

POSTGRES_HOST=192.168.0.10
KAFKA_BOOTSTRAP_SERVERS=192.168.0.10:9092

Ensure this IP is reachable from the worker machine.

2. Open Required Ports on the Master

On the master instance, ensure the following ports are open and accessible:

9092 – Kafka (required if worker instance is used)
5432 – PostgreSQL (required if worker instance is used)
8501 – Streamlit dashboard (required if dashboard is used)

This may require router port forwarding or firewall adjustments depending on your network setup.

3. Start the Appropriate Docker Compose

On each instance, make sure you use the correct compose file:

🖥️ On the Master:

docker compose -f docker/docker-compose.yaml up -d

🚀 On the Worker:

docker compose -f docker-worker/docker-compose.yaml up -d

🔁 Scaling Worker Services

To increase throughput or parallelism, you can run multiple copies of any worker (e.g., nlp_absa, nlp_worker, etc.).

Just duplicate the service in your Compose file with a different name:

nlp_absa_2:
  <<: *nlp_absa

Each instance will independently consume messages from the same Kafka topic, allowing easy horizontal scaling across machines.

🐳 Docker Compose Compatibility

This project uses modern Docker Compose. Depending on your environment:

Use docker compose (preferred, Docker v20.10+, preinstalled on most systems)
Or docker-compose (legacy, still supported but no longer maintained)

Both work with the provided files. Adapt the commands accordingly.

📄 License

Code: GNU GPLv3 – Copyleft, fully open source
Visual/Content: CC BY 4.0 – Attribution required

👨‍🎓 Acknowledgements

This project was developed as part of a course assignment in the Big Data track at the
School of Engineering, ZHAW – supervised by Jonathan Fürst and Nico Spahni.

We especially thank Jonathan Fürst for his valuable technical support and feedback throughout the development process.

We also acknowledge the use of ChatGPT for architecture validation, debugging, and prototyping assistance.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
docker-worker		docker-worker
docker		docker
init		init
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BD – Bluesky Dashboard 📊☁️

🚀 Overview

📦 Project Structure

🛠️ Setup & Usage

1. 📁 Clone the Repository

2. 📄 Environment Variables

⚡️ Optional Worker Instance (for additional compute power)

1. Adjust the Worker Environment File

2. Open Required Ports on the Master

3. Start the Appropriate Docker Compose

🖥️ On the Master:

🚀 On the Worker:

🔁 Scaling Worker Services

🐳 Docker Compose Compatibility

📄 License

👨‍🎓 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

BDP25/BlueskyDashboard

Folders and files

Latest commit

History

Repository files navigation

BD – Bluesky Dashboard 📊☁️

🚀 Overview

📦 Project Structure

🛠️ Setup & Usage

1. 📁 Clone the Repository

2. 📄 Environment Variables

⚡️ Optional Worker Instance (for additional compute power)

1. Adjust the Worker Environment File

2. Open Required Ports on the Master

3. Start the Appropriate Docker Compose

🖥️ On the Master:

🚀 On the Worker:

🔁 Scaling Worker Services

🐳 Docker Compose Compatibility

📄 License

👨‍🎓 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages