BERTrend-Inspired Topic Modeling and Trend Detection System

This project implements the first part of a trend detection system inspired by the BERTrend paper. It focuses on extracting and storing topics from time-sliced text data using BERTopic and Elasticsearch.

Features

✅ Time slicing of input documents by day/week/month
✅ Text embeddings using Sentence-BERT (all-MiniLM-L6-v2)
✅ Dimensionality reduction (UMAP, via BERTopic)
✅ Clustering using HDBSCAN (default in BERTopic)
✅ Topic modeling using BERTopic with configurable parameters
✅ Topic naming using class-based TF-IDF (c-TF-IDF)
✅ Filtering of outlier/noisy topics
✅ Storing results to Elasticsearch (bertrend_results_*)
✅ Modular pipeline design for easy extension and UI integration
✅ Inspect results via script or UI-ready JSON

🧪 How It Works

Ingest Input Data
- Load CSV or raw data with timestamp and text fields
- Save documents to bertrend_data index in Elasticsearch
Slice & Model Topics
- Data is grouped into time slices (e.g., daily)
- BERTopic is applied on each slice independently
- Each topic includes representative docs and keywords
- Results are stored to bertrend_results_<job_id>
Inspect Results
- Run CLI script to preview top topics and metadata
- Can be visualized or exported later to UI

⚙️ Configuration

Modify configs/config.yaml to adjust:

Time slice granularity
BERTopic parameters (top_n_words, embedding model, etc.)
Elasticsearch index names

1. Setup

python -m venv .env
source .env/bin/activate
pip install -r requirements.txt

cd elastic/
docker-compose up -d


python src/ingest_data.py



run by passing parameters:
python -m src.topic_extraction --index bertrend_data --from_date 2025-01-01 --to_date 2025-03-30 --job_id test_01 --slice_unit week

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
elastic		elastic
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BERTrend-Inspired Topic Modeling and Trend Detection System

Features

🧪 How It Works

⚙️ Configuration

1. Setup

About

Uh oh!

Releases

Packages

Languages

semantic-systems/trend-detector

Folders and files

Latest commit

History

Repository files navigation

BERTrend-Inspired Topic Modeling and Trend Detection System

Features

🧪 How It Works

⚙️ Configuration

1. Setup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages