Skip to content

semantic-systems/trend-detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BERTrend-Inspired Topic Modeling and Trend Detection System

This project implements the first part of a trend detection system inspired by the BERTrend paper. It focuses on extracting and storing topics from time-sliced text data using BERTopic and Elasticsearch.


Features

  • Time slicing of input documents by day/week/month
  • Text embeddings using Sentence-BERT (all-MiniLM-L6-v2)
  • Dimensionality reduction (UMAP, via BERTopic)
  • Clustering using HDBSCAN (default in BERTopic)
  • Topic modeling using BERTopic with configurable parameters
  • Topic naming using class-based TF-IDF (c-TF-IDF)
  • Filtering of outlier/noisy topics
  • Storing results to Elasticsearch (bertrend_results_*)
  • Modular pipeline design for easy extension and UI integration
  • Inspect results via script or UI-ready JSON

🧪 How It Works

  1. Ingest Input Data

    • Load CSV or raw data with timestamp and text fields
    • Save documents to bertrend_data index in Elasticsearch
  2. Slice & Model Topics

    • Data is grouped into time slices (e.g., daily)
    • BERTopic is applied on each slice independently
    • Each topic includes representative docs and keywords
    • Results are stored to bertrend_results_<job_id>
  3. Inspect Results

    • Run CLI script to preview top topics and metadata
    • Can be visualized or exported later to UI

⚙️ Configuration

Modify configs/config.yaml to adjust:

  • Time slice granularity
  • BERTopic parameters (top_n_words, embedding model, etc.)
  • Elasticsearch index names

1. Setup

python -m venv .env
source .env/bin/activate
pip install -r requirements.txt

cd elastic/
docker-compose up -d


python src/ingest_data.py



run by passing parameters:
python -m src.topic_extraction --index bertrend_data --from_date 2025-01-01 --to_date 2025-03-30 --job_id test_01 --slice_unit week

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages