** Previously known as dbt-llm-agent **
Ragstar is an AI agent that learns the context of your dbt project to answer questions, generate documentation, and bring data insights closer to your users. Interact via Slack or CLI, and watch it improve over time with feedback.
BETA NOTICE: Ragstar is currently in beta. Core features include agentic model interpretation and semantic question answering about your dbt project.
- Democratize Data Access: Allow anyone to ask questions about your dbt project in natural language via Slack or CLI.
- Automate Documentation: Generate model and column descriptions where they're missing, improving data catalog quality.
- Enhance Data Discovery: Quickly find relevant models and understand their logic without digging through code.
- Continuous Learning: Ragstar learns from feedback to provide increasingly accurate and helpful answers.
- Natural Language Q&A: Ask about models, sources, metrics, lineage, etc.
- Agentic Interpretation: Intelligently analyzes dbt models, understanding logic and context.
- Automated Documentation Generation: Fills documentation gaps using LLMs.
- Semantic Search: Finds relevant assets based on meaning, not just keywords.
- dbt Integration: Parses metadata from dbt Cloud, local runs (
manifest.json
), or source code. - Postgres + pgvector Backend: Stores metadata and embeddings efficiently.
- Feedback Loop: Tracks questions and feedback for improvement.
- Slack Integration: Built-in Slackbot for easy interaction.
- Accelerate Data Discovery: Quickly find relevant dbt models and understand their purpose without digging through code.
- Improve Onboarding: Help new team members understand the dbt project structure and logic faster.
- Maintain Data Documentation: Keep dbt documentation up-to-date with automated generation and suggestions.
- Enhance Data Governance: Gain better visibility into data lineage and model dependencies.
- Debug dbt Models: Ask clarifying questions about model logic and calculations.
Ragstar combines several technologies to provide its capabilities:
- dbt Project Parsing: Extracts comprehensive metadata from dbt artifacts (
manifest.json
) or source files (.sql
,.yml
), including models, sources, exposures, metrics, tests, columns, descriptions, and lineage. - PostgreSQL Database with pgvector: Serves as the central knowledge store. It holds structured metadata parsed from the dbt project, generated documentation, question/answer history, and vector embeddings of model and column descriptions for semantic search.
- Vector Embeddings: Creates numerical representations (embeddings) of model and column documentation using sentence-transformer models. These embeddings capture semantic meaning, enabling powerful search capabilities.
- Large Language Models (LLMs): Integrates with LLMs (e.g., OpenAI's GPT models) via APIs to:
- Understand natural language questions.
- Generate human-readable answers based on retrieved context from the database and embeddings.
- Interpret model logic and generate documentation.
- Agentic Reasoning: Employs a step-by-step reasoning process, especially for model interpretation, where it breaks down the task, gathers evidence (e.g., upstream model definitions), and synthesizes an interpretation, similar to how a human analyst would approach it.
- CLI Interface: Provides command-line tools (
ragstar ...
) for initialization, embedding generation, asking questions, providing feedback, and managing the system.
Setting up Ragstar involves configuring environment variables and initializing the application with your dbt project data. The recommended method is using Docker Compose, which bundles the application and a PostgreSQL database with the required pgvector
extension.
-
Clone the repository:
git clone https://github.com/pragunbhutani/ragstar.git cd ragstar
-
Set up environment variables: Rename
.env.example
to.env
and populate it with your specific configurations, such as your OpenAI API key and theAPP_HOST
(e.g.,localhost
or your server's IP address).cp .env.example .env # Open .env and fill in your values
-
Configure Ragstar rules: Rename
.ragstarrules.example.yml
to.ragstarrules.yml
. This file allows you to define custom instructions and behaviors for your RAG application.cp .ragstarrules.example.yml .ragstarrules.yml # Open .ragstarrules.yml and customize if needed
-
Build and run with Docker Compose: This command will build the Docker images and start the application in detached mode.
docker compose up --build -d
-
Run initial Django commands: Execute these commands in the
app
container to set up the database and create an admin user.docker compose exec app uv run python manage.py migrate docker compose exec app uv run python manage.py createsuperuser # Follow prompts to create your admin user
-
Initialize your project: This command sets up the necessary project configurations. You can choose between
cloud
,core
, orlocal
methods.docker compose exec app uv run python manage.py init_project --method cloud # Or --method core, or --method local
-
Access the Django Admin: Open your web browser and navigate to
http://<your_APP_HOST_value>/admin
(e.g.,http://localhost/admin
ifAPP_HOST=localhost
). Log in with the superuser credentials you created. -
Embed Models: In the Django admin interface, you can:
- Navigate to "Models".
- Click on "Interpret".
- Select and embed the models you want to use for answering questions.
If you prefer not to use Docker, you can set up a local Python environment.
-
Prerequisites:
- Python 3.10 or higher.
uv
(Python package installer and virtual environment manager). You can install it from https://github.com/astral-sh/uv.- A running PostgreSQL server (version 11+) with the
pgvector
extension enabled.
-
Check Python Version:
python --version # or python3 --version
-
Clone Repository:
git clone https://github.com/pragunbhutani/ragstar.git cd ragstar
-
Create a virtual environment and install dependencies:
# Create a virtual environment (e.g., named .venv) python -m venv .venv # Or using uv: uv venv # Activate the virtual environment source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install dependencies using uv uv pip install -r requirements.txt # Or if you have a pyproject.toml configured for uv: # uv pip install -e .
-
Set up PostgreSQL:
- Install PostgreSQL and
pgvector
. - Create a database (e.g.,
ragstar_local_dev
). - Enable the
pgvector
extension in the database:-- Run in psql connected to your database CREATE EXTENSION IF NOT EXISTS vector;
- Install PostgreSQL and
-
Configure Environment Variables: Copy the example environment file and fill in your details:
cp .env.example .env
Edit
.env
:- Required: Set your
OPENAI_API_KEY
. - Required: Set
DATABASE_URL
to your local PostgreSQL connection string (e.g.,postgresql://user:password@host:port/dbname
). Ensure this matches the database you created. (The.env.example
might have fallback variables likeDB_NAME_FALLBACK
, etc., which are used ifDATABASE_URL
is not set; for local setup, explicitly settingDATABASE_URL
is clearer). - Required: Set
APP_HOST
(e.g.,localhost
or127.0.0.1
). - Ragstar Rules: Rename
.ragstarrules.example.yml
to.ragstarrules.yml
and customize if needed. - Slack (Optional): Configure
INTEGRATIONS_SLACK_BOT_TOKEN
andINTEGRATIONS_SLACK_SIGNING_SECRET
if you plan to use the Slack integration. - Other: Review other variables like
RAGSTAR_LOG_LEVEL
, etc., and adjust if needed.
- Required: Set your
-
Run Database Migrations: Apply database schema changes:
uv run python manage.py migrate
-
Create a Superuser: Create an admin account to access the Django admin interface:
uv run python manage.py createsuperuser # Follow the prompts
-
Initialize your project: This command sets up the necessary project configurations.
uv run python manage.py init_project --method cloud # Or --method core, or --method local, depending on your dbt project setup.
-
Run the Development Server:
uv run python manage.py runserver
The application will typically be available at
http://<your_APP_HOST_value>:8000
(e.g.,http://localhost:8000
). -
Access the Django Admin & Embed Models: Follow the same steps as in the Docker setup (steps 7 and 8) to access the admin interface (
http://<your_APP_HOST_value>:8000/admin
) and embed your models.
After setup and initialization, you can interact with Ragstar.
Most Django manage.py
commands should be run inside the app
container using docker compose exec
:
# Example: Run database migrations (if not already done by entrypoint)
docker compose exec app uv run python manage.py migrate
# Example: Create a superuser (if not done during initial setup)
docker compose exec app uv run python manage.py createsuperuser
# Example: Initialize project
docker compose exec app uv run python manage.py init_project --method cloud
The application server is started automatically by docker compose up
. Access it via http://<your_APP_HOST_value>/admin
.
Run Django manage.py
commands directly using uv run
from your activated virtual environment:
# Example: Run the development server
uv run python manage.py runserver
# Example: Create a superuser
uv run python manage.py createsuperuser
# Example: Initialize project
uv run python manage.py init_project --method cloud
Access the application at http://<your_APP_HOST_value>:8000
and the admin interface at http://<your_APP_HOST_value>:8000/admin
.
The primary way to manage and interact with the application (outside of the web interface) is through Django's manage.py
script. Here are some key commands:
uv run python manage.py migrate
: Applies database migrations.uv run python manage.py createsuperuser
: Creates an administrator account.uv run python manage.py init_project --method <cloud|core|local>
: Initializes Ragstar with your project data (e.g., from dbt). This is crucial for setting up the knowledge base.uv run python manage.py runserver [host:port]
: Starts the Django development web server.
Other functionalities, such as interpreting and embedding models, are primarily handled through the Django admin interface after logging in.
Ragstar provides a Slack manifest to easily integrate its functionalities into your Slack workspace.
- Ensure
INTEGRATIONS_SLACK_SIGNING_SECRET
andINTEGRATIONS_SLACK_BOT_TOKEN
are set in your.env
file. - Use the
.slack_manifest.example.json
file as a template to create a new Slack app. - Follow Slack's documentation for creating an app from a manifest.
- This will enable features like asking questions and receiving answers directly within Slack (assuming the Slack integration is running as part of the Django application).
Contributions are welcome! Please follow standard fork-and-pull-request workflow.