A real-time sentiment & entity analysis dashboard built on top of Bluesky's decentralized feed using Big Data pipelines and NLP techniques.
Developed by Fabian Spühler & Jonne van Dijk
A student project at ZHAW – Zurich University of Applied Sciences
This dashboard analyzes and visualizes sentiment trends and named entities from the Bluesky social media platform in real time.
It leverages a full open-source tech stack including Apache Kafka, Spark, PostgreSQL/TimescaleDB and Transformer-based NLP models.
- 🌐 Data Source: Jetstream (Bluesky firehose)
- 🔄 Processing: Kafka + Python-based NLP workers
- 🧠 Analysis: Sentiment classification, ABSA (Aspect-Based Sentiment Analysis), relation extraction
- 📊 Frontend: Streamlit Dashboard with interactive visualizations
📁 src/
├─ analytics/ → Data extraction scripts
├─ ingestion/ → Jetstream ingestion & Kafka producer
├─ processing/ → NLP workers (sentiment, ABSA, language detection)
├─ tools/ → Approaches for automatic entity list generation
└─ visualization/ → Streamlit app
📁 data/
└─ raw/ → Parquet and intermediate files
📁 init/
└─ ⛃ init_timescale.sql → Database Initialisation file
📁 docker/
└─ 📄 docker-compose.yaml → Full container setup main instance
📁 docker-worker/
└─ 📄 docker-compose.yaml → Full container setup worker instance
📁 config/ → environment files (master, worker)
git clone https://github.com/BDP25/BlueskyDashboard.git
cd BlueskyDashboard
All required .env
files are already included in the repository. No additional setup is needed on the master instance.
If you want to run heavy NLP workers (e.g., with GPU support) on a separate machine, follow these steps:
Edit config/.env.worker
and replace the default master IP (10.0.2.31
) with the actual IP address of your master instance:
POSTGRES_HOST=192.168.0.10
KAFKA_BOOTSTRAP_SERVERS=192.168.0.10:9092
Ensure this IP is reachable from the worker machine.
On the master instance, ensure the following ports are open and accessible:
9092
– Kafka (required if worker instance is used)5432
– PostgreSQL (required if worker instance is used)8501
– Streamlit dashboard (required if dashboard is used)
This may require router port forwarding or firewall adjustments depending on your network setup.
On each instance, make sure you use the correct compose file:
docker compose -f docker/docker-compose.yaml up -d
docker compose -f docker-worker/docker-compose.yaml up -d
To increase throughput or parallelism, you can run multiple copies of any worker (e.g., nlp_absa
, nlp_worker
, etc.).
Just duplicate the service in your Compose file with a different name:
nlp_absa_2:
<<: *nlp_absa
Each instance will independently consume messages from the same Kafka topic, allowing easy horizontal scaling across machines.
This project uses modern Docker Compose. Depending on your environment:
- Use
docker compose
(preferred, Docker v20.10+, preinstalled on most systems) - Or
docker-compose
(legacy, still supported but no longer maintained)
Both work with the provided files. Adapt the commands accordingly.
This project was developed as part of a course assignment in the Big Data track at the
School of Engineering, ZHAW – supervised by Jonathan Fürst and Nico Spahni.
We especially thank Jonathan Fürst for his valuable technical support and feedback throughout the development process.
We also acknowledge the use of ChatGPT for architecture validation, debugging, and prototyping assistance.