![]() |
Join Decoding ML for proven content on designing, coding, and deploying production-grade AI systems with software engineering and MLOps best practices to help you ship AI applications. Every week, straight to your inbox. |
This guide will help you set up and run the Second Brain Online Module, which contains the code for Module 6: the RAG agentic app and LLMOps layer.
Note that this module is entirely independent of the offline ML pipelines. Thus, it comes with its own set of dependencies and requirements. Also, to run this module, you must first go through the steps from the Offline ML Pipelines part of the course, which will populate the vector database and other required resources.
- 📋 Prerequisites
- 🎯 Getting Started
- 📁 Project Structure
- 🏗️ Set Up Your Local Infrastructure
- ⚡️ Running the Code for Each Module
- 🔧 Utlity Commands
We depend on the same prerequisites as the offline ML pipelines. If modules 1 to 5 are working, you are good to go. Just make sure to fill in the .env
file with the correct credentials.
For all the modules, you'll need the following tools installed locally:
Tool | Version | Purpose | Installation Link |
---|---|---|---|
Python | 3.11 | Programming language runtime | Download |
uv | ≥ 0.4.30 | Python package installer and virtual environment manager | Download |
GNU Make | ≥ 3.81 | Build automation tool | Download |
Git | ≥2.44.0 | Version control | Download |
Docker | ≥27.4.0 | Containerization platform | Download |
📌 Windows users also need to install WSL (Click to expand)
We will be using Unix commands across the course, so if you are using Windows, you will need to install WSL, which will install a Linux kernel on your Windows machine and allow you to use the Unix commands from our course (this is the recommended way to write software on Windows).
Also, the course requires access to these cloud services. The authentication to these services is done by adding the corresponding environment variables to the .env
file:
Service | Purpose | Cost | Environment Variable | Setup Guide | Starting with Module |
---|---|---|---|---|---|
OpenAI API | LLM API | Pay-per-use | OPENAI_API_KEY |
Quick Start Guide | Module 2 |
Hugging Face | MLOps | Free tier | HUGGINGFACE_ACCESS_TOKEN |
Quick Start Guide | Module 3 |
Comet | Experiment tracking | Free tier | COMET_API_KEY |
Quick Start Guide | Module 4 |
Opik | LLM evaluation and prompt monitoring | Free tier (Hosted on Comet - same API Key) | COMET_API_KEY |
Quick Start Guide | Module 6 |
When working locally, the infrastructure is set up using Docker. Thus, you can use the default values found in the config.py file for all the infrastructure-related environment variables.
But, in case you want to deploy the code, you'll need to setup the following services with their corresponding environment variables:
Service | Purpose | Cost | Required Credentials | Setup Guide |
---|---|---|---|---|
MongoDB | document database (with vector search) | Free tier | MONGODB_URI |
1. Create a free MongoDB Atlas account 2. Create a Cluster 3. Add a Database User 4. Configure a Network Connection |
Start by cloning the repository and navigating to the project directory:
git clone https://github.com/decodingml/second-brain-ai-assistant-course.git
cd second-brain-ai-assistant-course
First deactivate any active virtual environment and move to the second-brain-online
directory:
deactivate
cd apps/second-brain-online
To install the dependencies and activate the virtual environment, run the following commands:
uv venv .venv-online
. ./.venv-online/bin/activate
uv pip install -e .
Note
The online application uses a different set of dependencies than the offline ML pipelines.
Before running any command, you have to set up your environment:
- Create your environment file:
cp .env.example .env
- Open
.env
and configure the required credentials following the inline comments and the recommendations from the Cloud Services section.
At Decoding ML we teach how to build production ML systems. Thus, instead of splitting the code into separate modules, the course follows the structure of a real-world Python project:
.
├── configs/ # ZenML configuration files
├── src/second_brain_online/ # Main package directory
│ ├── application/ # Application layer
│ ├── config.py # Configuration settings
│ └── opik_utils.py # Opik utility functions
├── tools/ # Entrypoint scripts that use the Python package
├── .env.example # Environment variables template
├── .python-version # Python version specification
├── Makefile # Project commands
└── pyproject.toml # Project dependencies
We use Docker to setup the local infrastructure (MongoDB).
Warning
Before running the command below, ensure you do not have any processes running on port 27017
(MongoDB).
To start the Docker infrastructure, run:
make local-infrastructure-up
To stop the Docker infrastructure, run:
make local-infrastructure-down
Note
To visualize the raw and RAG data from MongoDB, we recommend using MongoDB Compass or Mongo's official IDE plugin (e.g., MongoDB for VS Code
). To connect to the working MongoDB instance, use the MONGODB_URI
value from the .env
file or found inside the config.py file.
NOTE: To run these modules, you must first go through the steps from the Offline ML Pipelines part of the course.
To simulate the course modules, we split the CLI commands and offline ML pipelines you must run per module so you know exactly where you are in the course.
Quickly test the agent from the CLI with a predefined query:
make run_agent_query RETRIEVER_CONFIG=configs/compute_rag_vector_index_openai_parent.yaml
You should see something like this:
Vector databases and vector indices are related concepts in the field of data storage and retrieval, particularly in contexts where high-dimensional vector representations of data are used, such as in machine learning and AI. Here are the key differences:
1. **Vector Databases**:
- A vector database ...
Important
Be sure that the retriever config is the exact same one as the one used in Module 5 during the RAG feature pipeline to populate the vector database. If they don't match, the used retriever will use different settings resulting in errors or unexpected results. Here is a quick reminder of when to use which config:
- Parent Retrieval with OpenAI models:
configs/compute_rag_vector_index_openai_parent.yaml
- Simple Contextual Retrieval with OpenAI models:
configs/compute_rag_vector_index_openai_contextual_simple.yaml
- Simple Contextual Retrieval with Hugging Face models:
configs/compute_rag_vector_index_huggingface_contextual_simple.yaml
- Full-fledged Contextual Retrieval with OpenAI models:
configs/compute_rag_vector_index_openai_contextual.yaml
You can also spin-up a Gradio UI to test the agent with custom queries similar to any other chatbot:
make run_agent_app RETRIEVER_CONFIG=configs/compute_rag_vector_index_openai_parent.yaml
You should see something like this:
second_brain_ai_assistant_example.mp4
Evaluate the agent with our predefined evaluation queries (found under tools/evaluate_app.py
):
make evaluate_agent RETRIEVER_CONFIG=configs/compute_rag_vector_index_openai_parent.yaml
After running the evaluation, open Opik to see the evaluation results, as seen in the image below:
For running the evaluation, plus playing around with the agent (~20 queries), the costs and running time are:
- Running costs OpenAI: ~$0.5
- Running costs Hugging Face Dedicated Endpoints (optional - you can use only the OpenAI models for summarization): ~$1 (the deployment costs $1 / hour)
- Running time evaluation: ~15 minutes (the AI Assistant runs in real-time)
make format-check
make format-fix
make lint-check
make lint-fix