Skip to content

HendrikReh/AI-Powered-Amazon-Product-Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI-Powered Amazon Product Assistant (B2C only)

An end-to-end AI engineering project that builds an intelligent product recommendation and analysis system using Amazon Electronics dataset with a complete RAG implementation. This capstone project demonstrates modern AI engineering practices including data processing, visualization, vector databases, and retrieval-augmented generation (RAG).

Course: End-to-End AI Engineering Bootcamp (Maven)

Features

  • Data Processing Pipeline: Automated processing of large-scale Amazon product and review data
  • Interactive Visualizations: Comprehensive analysis dashboards with temporal trends, category insights, and rating patterns
  • Complete RAG System: Vector database with ChromaDB, intelligent query processing, and context-aware retrieval
  • Advanced Streamlit UI: Professional tab-based interface with smart query suggestions, real-time monitoring, and enhanced response visualization
  • Multi-Provider Support: Compatible with OpenAI, Groq, and Google Gemini models
  • Vector Database: ChromaDB-powered semantic search with GTE-large embeddings, metadata filtering and hybrid queries
  • Query Intelligence: Automatic query type detection for product reviews, comparisons, complaints, and recommendations
  • RAG Evaluation Framework: Industry-standard RAGAS evaluation with enhanced Weave integration for complete metric visibility
  • Enhanced Weave-RAGAS Integration: All evaluation metrics (faithfulness, relevancy, precision, recall) visible in Weave UI with drill-down capabilities
  • Synthetic Test Data: Advanced synthetic data generation with template-based queries, variation techniques, and quality analysis
  • Production Testing: Automated test case generation with configurable difficulty distributions and Weave traceability
  • Optimized Weave Tracing: Production-ready AI pipeline monitoring with efficient session-based initialization, zero-redundancy design, and comprehensive analytics
  • LiteLLM Integration: Unified access to 100+ LLM providers including Ollama for local models
  • Vector Database Management: Scripts for reinitializing and managing ChromaDB with custom JSONL data
  • LangGraph Agent: ReAct pattern conversational agent with reasoning traces, tool use, and persistent state
  • Session Management: PostgreSQL-based conversation persistence for multi-turn interactions
  • Agent Mode Toggle: Seamless switching between direct RAG and agent-mediated queries

Out-of-Scope (B2B Scope)

  • contractual pricing
  • account-specific catalogs
  • procurement compliance
  • multi-user workflows (approvers, requisitioners, etc.)
  • Bulk ordering, BOM-style inputs, or quote-based negotiation are not captured
  • ERP integration, punchout catalogs (OCI, cXML)
  • product taxonomies (e.g. ETIM, UNSPSC)

Dataset Overview

Source: Amazon Reviews 2023 - Electronics Category

  • Products: 1,000 carefully selected electronics products
  • Reviews: 20,000 customer reviews (10-20 reviews per product)
  • Date Range: 2003-2023 (20 years of review data)
  • Categories: Comprehensive electronics categories with hierarchical structure

Key Statistics

  • Average reviews per product: 20
  • Review rating distribution: 4.2/5.0 average
  • Most active day: Tuesday (3,068 reviews)
  • Most active month: January (2,283 reviews)
  • Recent activity: 37.8% of reviews from 2020 onwards
  • Embedding Model: GTE-large (1024 dimensions) for superior semantic search

Setup & Installation

Prerequisites

  • Python 3.12+
  • uv package manager
  • Docker (optional, for containerized deployment)
  • Ollama (optional, for local LLM models)

Quick Start

  1. Clone the repository

    git clone <repository-url>
    cd AI-Powered-Amazon-Product-Assistant
  2. Install dependencies

    uv sync
  3. Configure environment variables

    # Create .env file with your API keys
    cp .env.example .env  # if available, or create manually
    
    # Required for chatbot functionality
    echo "OPENAI_API_KEY=your_openai_key" >> .env
    echo "GROQ_API_KEY=your_groq_key" >> .env  
    echo "GOOGLE_API_KEY=your_google_key" >> .env
    
    # Optional for Weave tracing
    echo "WANDB_API_KEY=your_wandb_key" >> .env
    
    # Optional for Ollama (local LLMs)
    echo "OLLAMA_BASE_URL=http://localhost:11434" >> .env
  4. Set up Jupyter kernel

    uv run python -m ipykernel install --user --name ai-product-assistant
  5. Run data processing (if needed)

    uv run jupyter notebook notebooks/data_preprocessing.ipynb
  6. Launch applications

    # Visualization dashboard
    uv run jupyter notebook notebooks/data_visualization.ipynb
    
    # Enhanced Streamlit chatbot interface with tab-based UI and RAG
    uv run streamlit run src/chatbot-ui/streamlit_app.py
    # OR use Make
    make run-streamlit
    
    # Run FastAPI server with agent endpoint (required for agent mode)
    make run-api
    
    # Optional: Run with PostgreSQL for conversation persistence
    docker-compose -f docker-compose.postgres.yml up -d
    make run-api  # Will automatically detect and use PostgreSQL
    
    # Run Weave-native evaluation (RECOMMENDED - follows official best practices)
    uv run python scripts/eval/run_weave_native_evaluation.py --dataset-path "data/evaluation/rag_evaluation_dataset.json" --openai-api-key YOUR_KEY
    
    # Run model comparison with native evaluation
    uv run python scripts/eval/run_weave_native_evaluation.py --mode comparison --dataset-path "data/evaluation/rag_evaluation_dataset.json"
    
    # Run enhanced Weave-RAGAS evaluation (ensures all metrics visible in Weave UI)
    uv run python scripts/eval/run_enhanced_evaluation.py --single-query "What are iPhone charger features?" --wandb-api-key YOUR_KEY
    
    # Run full evaluation with complete metric tracking
    uv run python scripts/eval/run_enhanced_evaluation.py --dataset-path "data/evaluation/rag_evaluation_dataset.json" --wandb-api-key YOUR_KEY
    
    # Alternative: Standard RAGAS evaluation
    uv run python scripts/eval/run_ragas_evaluation.py --single-query "What are iPhone charger features?" --ground-truth "iPhone chargers typically feature Lightning connector, fast charging support, USB-C power adapter compatibility, and MFi certification"
    
    # Generate ragas test dataset (Note: If you get entity extraction errors, see CLAUDE.md)
    uv run python scripts/eval/generate_ragas_dataset.py --test-size 50
    # Alternative: Generate simple synthetic dataset
    uv run python scripts/eval/generate_simple_ragas_dataset.py --synthetic-only

Docker Deployment (with ChromaDB)

Note: ChromaDB is an API service and doesn't have a web interface. To interact with your data, use the Streamlit app at http://localhost:8501.

# Build the containers
make build-docker-streamlit

# Run both Streamlit app and ChromaDB service
make run-docker-streamlit

# View logs
make logs-docker-streamlit

# Stop services
make stop-docker-streamlit

# Restart services
make restart-docker-streamlit

Docker Services:

  • Streamlit App: http://localhost:8501 (Enhanced tab-based interface)
  • ChromaDB API: http://localhost:8000 (API service - no web UI)
    • Health check: curl http://localhost:8000/api/v2/heartbeat
    • Collections: curl http://localhost:8000/api/v2/collections
  • Persistent Storage: Vector data persisted in Docker volume

Enhanced Streamlit Interface

The application features a professional tab-based interface designed for optimal user experience:

🔧 Configuration Tab:

  • System Status: Real-time monitoring of Weave tracing and RAG system initialization
  • Model Selection: Choose from OpenAI (GPT-4o, GPT-4o-mini), Groq (Llama-3.3-70b), or Google (Gemini-2.0-flash)
  • Parameter Controls: Fine-tune temperature, max tokens, top-p, and top-k with provider-specific support
  • RAG Configuration: Enable/disable RAG with customizable product and review limits

💬 Query Tab:

  • Smart Examples: 12+ categorized example queries across 6 use cases (Product Info, Reviews, Comparisons, Complaints, Recommendations, Use Cases)
  • Query History: Access and reuse your last 10 queries with one click
  • Auto-Suggestions: Get intelligent query completions based on partial input (3+ characters)
  • Quick Filters: Filter by query type, product category, and price range
  • Enhanced Input: Dynamic placeholders and integrated filter display

📊 Monitoring Tab:

  • Session Statistics: Track message counts, query history, and usage patterns
  • Real-Time Performance: View RAG vs LLM processing times with percentage breakdown
  • RAG Analytics: Monitor retrieved products/reviews and query type detection
  • System Health: Check API configurations and system component status
  • Weave Integration: Direct links to W&B dashboard for detailed trace analysis

Enhanced Weave Tracing Setup

The application includes comprehensive Weave tracing for end-to-end AI pipeline monitoring and performance analysis.

  1. Get W&B API Key

  2. Configure Tracing

    # Add to your .env file
    echo "WANDB_API_KEY=your_wandb_api_key" >> .env
  3. Enhanced Features Tracked

    • Optimized Initialization: Single-session setup with session state management
    • RAG Pipeline Tracing: Query analysis, context building, and retrieval metrics
    • LLM Provider Tracking: Detailed request/response metadata for OpenAI, Groq, and Google
    • Performance Analytics: Sub-operation timing, character counts, and success rates
    • Error Classification: Structured error handling with types and fallback strategies
    • Real-Time UI Feedback: Processing times and operation status in sidebar
    • Context Quality Metrics: Query type detection, extracted terms, and retrieval effectiveness
    • Trace Optimization: Eliminated redundant calls and duplicate initialization
  4. Optimized Operation Monitoring

    • Session-Based Initialization: Single setup per session via @st.cache_resource
    • Consolidated Tracing: Primary trace points at key pipeline stages
    • RAG Enhancement Metrics: Query processing timing and context quality
    • LLM Provider Analytics: Request/response data with performance breakdown
    • End-to-End Pipeline: Complete timing analysis from query to response
    • Zero-Redundancy Design: Eliminated multiple trace calls for same operations
  5. Production-Ready Monitoring

    • Optimized Trace Volume: Meaningful traces without duplication
    • Session State Management: Prevents repeated initialization calls
    • Clean Dashboard Data: Visit your W&B dashboard for organized traces
    • Performance Insights: Navigate to "Bootcamp" project for analytics
    • Error Tracking: Structured error handling with fallback strategies
    • Real-Time Feedback: Processing times displayed in Streamlit sidebar

Vector Database Management

The project includes scripts for managing and reinitializing the ChromaDB vector database:

# Check current database status
uv run python scripts/check_vector_db.py

# Reinitialize with your own JSONL data (simple)
uv run python scripts/reinit_vector_db_simple.py your_data.jsonl --clear

# Reinitialize with advanced options
uv run python scripts/reinit_vector_db.py \
  --jsonl-path your_data.jsonl \
  --batch-size 50 \
  --persist-dir custom_db \
  --collection-name my_collection

# Append new data without clearing
uv run python scripts/reinit_vector_db.py \
  --jsonl-path additional_data.jsonl \
  --no-clear-existing

Supported JSONL formats:

  • Standard RAG format: {"id": "...", "text": "...", "type": "product|review", "metadata": {...}}
  • Amazon format: {"asin": "...", "title": "...", "description": "...", "reviewText": "..."}
  • Generic format: {"content": "...", "category": "...", "source": "..."}

For detailed documentation, see scripts/README_vector_db.md.

Ollama Local LLM Support

The application supports local LLMs through Ollama via LiteLLM:

# Install Ollama (visit https://ollama.com for instructions)

# Pull and run a model
ollama pull llama3.2
ollama run llama3.2

# The app will automatically detect Ollama at http://localhost:11434
# Select "Ollama" as the provider in the Streamlit configuration tab

Note for Docker users: When running the Streamlit app in Docker, Ollama running on your host machine is accessible via host.docker.internal:11434. This is automatically configured in the docker-compose.yml file.

Project Structure

AI-Powered-Amazon-Product-Assistant/
├── 📁 data/
│   ├── Electronics.jsonl                              # Raw review data (25GB)
│   ├── meta_Electronics.jsonl                         # Raw product metadata (4.9GB)
│   ├── 📁 processed/
│   │   ├── electronics_top1000_products.jsonl         # 1,000 product records
│   │   ├── electronics_top1000_products_reviews.jsonl # 20,000 review records
│   │   ├── electronics_rag_documents.jsonl            # 2,000 RAG-optimized documents
│   │   ├── dataset_summary.json                       # Processing metadata
│   │   └── README.md                                  # Data documentation
│   └── 📁 chroma_db/                                  # Vector database storage (local)
├── 📁 notebooks/
│   ├── data_preprocessing.ipynb                       # High-performance data processing with Polars
│   ├── data_visualization.ipynb                       # Efficient data visualization with Polars
│   ├── verify_api_keys.ipynb                         # API configuration testing
│   └── README.md                                      # Notebook documentation
├── 📁 src/
│   ├── 📁 chatbot-ui/
│   │   ├── 📁 core/
│   │   │   └── config.py                              # Multi-provider configuration
│   │   ├── streamlit_app.py                          # Main chatbot interface with RAG
│   │   └── session_manager.py                        # Session management for agent conversations
│   ├── 📁 core/                                      # Core modules
│   │   ├── __init__.py                               # Core module initialization
│   │   ├── base_classes.py                           # Base abstract classes
│   │   ├── config_improved.py                        # Enhanced configuration (Pydantic V2)
│   │   ├── decorators.py                             # Utility decorators (retry, cache, timing)
│   │   ├── exceptions.py                             # Custom exception hierarchy
│   │   ├── implementations.py                        # Concrete implementations
│   │   ├── llm_providers.py                          # LLM provider management
│   │   ├── llm_service.py                            # LLM service interface
│   │   ├── logging_config.py                         # Logging configuration
│   │   ├── performance.py                            # Performance optimization utilities
│   │   └── structured_outputs.py                     # Pydantic models for structured LLM responses
│   ├── 📁 agents/                                    # LangGraph agent implementation (Sprint 3)
│   │   ├── __init__.py                               # Agent module initialization
│   │   ├── state.py                                  # Agent state TypedDict definitions
│   │   ├── nodes.py                                  # ReAct pattern nodes (reasoning, action, observation)
│   │   ├── graph.py                                  # LangGraph workflow and routing
│   │   ├── react_agent.py                            # Main ReactAgent implementation
│   │   ├── 📁 tools/                                 # Agent tools
│   │   │   ├── __init__.py                           # Tools initialization
│   │   │   └── vector_search_tool.py                 # Vector search tool wrapping RAG
│   │   └── 📁 persistence/                           # State persistence
│   │       ├── __init__.py                           # Persistence initialization
│   │       ├── models.py                             # SQLAlchemy models for state storage
│   │       └── postgres_checkpointer.py              # PostgreSQL checkpointer for conversations
│   ├── 📁 api/                                       # FastAPI implementation (Sprint 2)
│   │   ├── __init__.py                               # API module initialization
│   │   ├── app.py                                    # Main FastAPI application
│   │   ├── dependencies.py                           # Dependency injection
│   │   ├── models.py                                 # Request/response models
│   │   ├── 📁 middleware/                            # API middleware
│   │   │   ├── __init__.py                           # Middleware initialization
│   │   │   ├── rate_limiting.py                      # Rate limiting middleware
│   │   │   ├── cors.py                               # CORS configuration
│   │   │   ├── authentication.py                     # API key authentication
│   │   │   └── error_handling.py                     # Global error handling
│   │   └── 📁 routers/                               # API route handlers
│   │       ├── __init__.py                           # Routers initialization
│   │       ├── health.py                             # Health check endpoints
│   │       └── rag.py                                # RAG and agent endpoints
│   ├── 📁 monitoring/                                # Monitoring and observability
│   │   └── integration.py                            # Monitoring system integration
│   ├── 📁 prompts/                                   # Prompt management (Sprint 2)
│   │   ├── __init__.py                               # Prompts module initialization
│   │   ├── registry.py                               # Prompt template registry
│   │   ├── filters.py                                # Custom Jinja2 filters
│   │   └── templates/                                # Jinja2 templates for all query types
│   ├── 📁 rag/
│   │   ├── vector_db.py                               # ChromaDB vector database (local, GTE-large)
│   │   ├── vector_db_docker.py                       # ChromaDB vector database (Docker, optimized)
│   │   ├── query_processor.py                        # RAG query processing (auto-selects implementation)
│   │   ├── hybrid_retrieval.py                       # BM25 and hybrid search implementation (Sprint 2)
│   │   └── 📁 experimental/                          # Experimental implementations for reference
│   │       ├── vector_db_improved.py                  # Best practices reference implementation
│   │       ├── vector_db_migrated.py                  # Factory pattern implementation
│   │       └── vector_db_optimized.py                # Performance optimization reference
│   ├── 📁 evaluation/
│   │   ├── __init__.py                                # Evaluation module interface
│   │   ├── rag_adapter.py                             # RAG system adapter for ragas framework
│   │   ├── ragas_evaluator.py                         # Main RAG evaluator using ragas
│   │   ├── ragas_reporter.py                          # HTML report generation for ragas results
│   │   ├── weave_ragas_evaluator.py                  # Basic Weave-RAGAS integration
│   │   ├── enhanced_weave_ragas.py                   # Enhanced Weave-RAGAS with full metric visibility
│   │   ├── weave_native_evaluation.py                # Weave-native evaluation (best practices)
│   │   ├── dataset.py                                 # Evaluation dataset creation and management
│   │   └── synthetic_data_generator.py               # Advanced synthetic test data generation
│   └── 📁 tracing/
│       ├── business_intelligence.py                   # Business intelligence tracking
│       └── trace_utils.py                            # Tracing utilities and helpers
├── 📁 tests/                                         # Test suite
│   ├── __init__.py                                   # Test module initialization
│   ├── conftest.py                                   # Pytest configuration and fixtures
│   ├── test_basic.py                                 # Basic test suite functionality
│   ├── test_infrastructure.py                        # Infrastructure test verification
│   ├── 📁 unit/                                      # Unit tests
│   │   ├── test_config.py                            # Configuration tests (Pydantic V2)
│   │   ├── test_decorators.py                        # Decorator functionality tests
│   │   ├── test_error_handling.py                    # Error handling tests
│   │   ├── test_litellm_service.py                   # LiteLLM service tests
│   │   ├── test_query_processor.py                   # Query processor tests
│   │   ├── test_vector_db.py                         # Vector database tests
│   │   └── test_vector_db_migrated.py               # Migrated vector DB tests
│   ├── 📁 integration/                               # Integration tests
│   │   ├── test_chatbot_e2e.py                       # End-to-end chatbot tests
│   │   ├── test_monitoring_integration.py            # Monitoring integration tests
│   │   └── test_rag_pipeline.py                      # RAG pipeline tests
│   └── 📁 fixtures/                                  # Test fixtures and mock data
├── 📁 scripts/                                       # UV scripts and utilities
│   ├── 📁 eval/                                      # Evaluation runner scripts
│   │   ├── run_enhanced_evaluation.py                # Enhanced Weave-RAGAS evaluation (RECOMMENDED)
│   │   ├── run_ragas_evaluation.py                   # Standard RAGAS evaluation
│   │   ├── run_weave_native_evaluation.py            # Weave-native evaluation
│   │   ├── run_weave_ragas_evaluation.py             # Basic Weave-RAGAS (DEPRECATED)
│   │   ├── generate_ragas_dataset.py                 # Generate ragas test datasets
│   │   └── generate_simple_ragas_dataset.py          # Simplified dataset generation
│   ├── validate_config.py                            # Configuration validation script
│   ├── run_streamlit.py                              # UV script to run Streamlit app
│   ├── run_api_server.py                             # UV script to run FastAPI server
│   ├── lint.py                                       # UV script for code linting
│   ├── format.py                                     # UV script for code formatting
│   ├── clean_notebooks.py                            # UV script to clean notebook outputs
│   ├── list_scripts.py                               # List all available UV scripts
│   ├── check_vector_db.py                            # Check vector database status
│   ├── reinit_vector_db.py                           # Reinitialize vector database
│   ├── reinit_vector_db_simple.py                    # Simple vector database reinitialization
│   ├── init-postgres.sql                             # PostgreSQL schema initialization (Sprint 3)
│   └── test_agent_simple.py                          # Simple agent testing script
├── 📁 examples/
│   └── synthetic_data_examples.py                    # Synthetic data usage demonstrations
├── 📁 docs/                                          # Technical documentation
│   ├── 📁 architecture/                              # System design documents
│   │   ├── CHROMA.md                                 # ChromaDB integration guide
│   │   ├── LOCAL_VS_DOCKER.md                        # Local vs Docker implementation comparison
│   │   └── DASHBOARD_METRICS.md                      # Dashboard metrics interpretation
│   ├── 📁 guides/                                    # How-to guides
│   │   ├── WEAVE_TRACING_GUIDE.md                    # LLM tracing & monitoring guide
│   │   ├── EVALUATIONS.md                            # RAG evaluation framework documentation
│   │   ├── SYNTHETIC_DATA.md                         # Synthetic test data generation guide
│   │   ├── GEMINI_MESSAGE_HANDLING.md                # Google Gemini integration guide
│   │   ├── DOCKER_TTY_FIXES.md                       # Container deployment fixes
│   │   ├── MONITORING_GUIDE.md                       # System monitoring setup
│   │   ├── PERFORMANCE_OPTIMIZATIONS.md              # Performance optimization guide
│   │   └── ENHANCED_WEAVE_RAGAS_GUIDE.md             # Enhanced Weave-RAGAS integration guide
│   ├── 📁 sprints/                                   # Sprint documentation
│   │   ├── SPRINT_0.md                               # Sprint 0 foundation summary
│   │   ├── SPRINT_1.md                               # Sprint 1 RAG implementation summary
│   │   ├── SPRINT_2.md                               # Sprint 2 production readiness summary
│   │   ├── SPRINT_3.md                               # Sprint 3 LangGraph agent summary
│   │   └── SPRINT_3_IMPLEMENTATION.md                # Sprint 3 detailed implementation guide
│   ├── 📁 testing/                                   # Testing documentation
│   ├── 📁 development/                               # Development process docs
│   └── 📁 planning/                                  # Vision and planning docs
├── 📄 pyproject.toml                                  # uv dependencies & config
├── 📄 docker-compose.yml                              # Multi-service container setup
├── 📄 docker-compose.postgres.yml                     # Extended Docker config with PostgreSQL (Sprint 3)
├── 📄 Dockerfile                                      # Container deployment
├── 📄 docker-entrypoint.sh                           # Container initialization script
├── 📄 Makefile                                        # Build automation (Docker & shell commands)
├── 📄 PROJECT_CANVAS.md                               # Project roadmap & tasks
├── 📄 CLAUDE.md                                       # AI assistant development log
└── 📄 README.md                                       # Project documentation

Data Processing Pipeline

The project includes a comprehensive data processing pipeline:

  1. Raw Data Ingestion: Processes large JSONL files from Amazon Reviews 2023
  2. Product Selection: Intelligently selects top 1000 products based on review volume and quality
  3. Review Sampling: Extracts representative reviews for each product
  4. Data Cleaning: Handles missing values, validates data integrity
  5. RAG Optimization: Formats data for retrieval-augmented generation systems
  6. Vector Database Creation: Automatic ingestion into ChromaDB with embeddings and metadata
  7. Query Processing: Intelligent context retrieval based on query type and intent

Visualization Capabilities

The visualization notebook provides comprehensive insights:

  • Review Distribution Analysis: Product popularity and rating patterns
  • Price Analysis: Price ranges and correlation with ratings
  • Category Analysis: Hierarchical category exploration
  • Store & Brand Analysis: Top performers and market distribution
  • Temporal Analysis: Review trends over time (2003-2023)
  • Text Analysis: Review length and content characteristics

Technical Stack

  • Data Processing: pandas, numpy, json, Polars (high-performance alternative)
  • Visualization: matplotlib, seaborn, plotly
  • Vector Database: Dual-architecture ChromaDB system (local: GTE-large, Docker: optimized)
  • Embedding Models: GTE-large (development) and ChromaDB default (production) with automatic selection
  • RAG Implementation: Custom query processing with intelligent context retrieval and environment detection
  • Agent Framework: LangGraph for ReAct pattern agent with tool use (Sprint 3)
  • State Persistence: PostgreSQL with SQLAlchemy for conversation management (Sprint 3)
  • API Framework: FastAPI with middleware, routers, and dependency injection (Sprint 2)
  • Structured Outputs: Instructor library with Pydantic models (Sprint 2)
  • Prompt Management: Jinja2 templating system with registry (Sprint 2)
  • Notebook Environment: Jupyter, IPython, Marimo (reactive notebooks)
  • Package Management: uv (modern Python package manager)
  • Web Interface: Professional Streamlit UI with tab-based architecture, smart query suggestions, and real-time monitoring
  • LLM Providers: OpenAI GPT-4o, Groq Llama, Google Gemini 2.0, Ollama (100+ via LiteLLM)
  • Monitoring: Optimized Weave tracing via Weights & Biases with session state management
  • Configuration: Pydantic V2 settings with environment variables
  • Testing: Pytest with 108+ tests, 91% coverage
  • Containerization: Docker with non-root security, Docker Compose for multi-service deployment

Usage Examples

Using Agent Mode

  1. Start Required Services:

    # Start API server (required for agent mode)
    make run-api
    
    # In another terminal, start Streamlit
    make run-streamlit
  2. Enable Agent Mode:

    • Go to the Configuration tab
    • Enable "Enable RAG (Product Search)"
    • Toggle "🤖 Enable Agent Mode (ReAct)"
  3. Ask Questions:

    • The agent will process queries with reasoning steps
    • View reasoning trace in expandable "🤔 Agent Reasoning Steps" section
    • Session info displayed in sidebar
  4. Example Queries:

    • "What are the main complaints about laptop backpacks?"
    • "Compare iPhone and Android chargers"
    • "Find budget tablets under $200 with good reviews"

Data Processing

# Load processed data
import pandas as pd
import json

# Load products
products = []
with open('data/processed/electronics_top1000_products.jsonl', 'r') as f:
    for line in f:
        products.append(json.loads(line.strip()))

df_products = pd.DataFrame(products)
print(f"Loaded {len(df_products)} products")

RAG System

# Test RAG system
from src.rag.query_processor import create_rag_processor

# Initialize processor
processor = create_rag_processor()

# Process a query
result = processor.process_query("What do people say about iPhone charger cables?")
print(f"Found {result['metadata']['num_products']} products and {result['metadata']['num_reviews']} reviews")

Enhanced Weave-RAGAS Evaluation

# Run enhanced evaluation with full Weave visibility
from src.evaluation.enhanced_weave_ragas import create_enhanced_evaluator
import asyncio

# Create enhanced evaluator
model, evaluator = create_enhanced_evaluator(
    project_name="my-rag-evaluation",
    openai_api_key="your_key"
)

# Run single evaluation
async def evaluate():
    result = await evaluator.evaluate_example(
        model=model,
        question="What are iPhone charger features?",
        ground_truth="iPhone cables feature Lightning connectors..."
    )
    print(f"Overall Score: {result['overall_score']:.3f}")
    print(f"Metrics: {result['metrics']}")

asyncio.run(evaluate())

Weave-Native Evaluation (Best Practices)

# Use Weave's native evaluation framework
from src.evaluation.weave_native_evaluation import create_rag_model, create_native_evaluator
import asyncio

# Create model and evaluator
model = create_rag_model(model_name="rag-v1", temperature=0.7)
evaluator = create_native_evaluator(project_name="rag-eval")

# Run evaluation
async def native_evaluate():
    # Create dataset
    dataset = evaluator.create_dataset(
        [{"query": "What are iPhone features?", "expected_answer": "..."}],
        name="test_dataset"
    )
    
    # Run evaluation
    results = await evaluator.evaluate_model(
        model=model,
        dataset=dataset,
        evaluation_name="Baseline Test"
    )

asyncio.run(native_evaluate())

Synthetic Test Data Generation

# Generate synthetic evaluation data
from src.evaluation.synthetic_data_generator import create_synthetic_dataset, SyntheticDataConfig

# Custom configuration
config = SyntheticDataConfig(
    num_examples_per_category=5,
    difficulty_distribution={"easy": 0.3, "medium": 0.5, "hard": 0.2},
    variation_techniques=["rephrase", "specificity", "context"]
)

# Generate synthetic examples
synthetic_examples = create_synthetic_dataset(config, num_examples=30)
print(f"Generated {len(synthetic_examples)} synthetic test cases")

# Create mixed dataset (original + synthetic)
from src.evaluation.synthetic_data_generator import create_mixed_dataset
original_examples = create_evaluation_dataset()
mixed_dataset = create_mixed_dataset(original_examples, synthetic_ratio=0.5)

Visualization

# Generate temporal analysis
from notebooks.data_visualization import temporal_analysis
temporal_analysis(df_reviews)

Troubleshooting

For detailed solutions to common issues, see docs/TROUBLESHOOTING.md.

Quick Fixes

  1. Ragas Entity Extraction Error: Use the simple generator:

    uv run python scripts/eval/generate_simple_ragas_dataset.py --synthetic-only
  2. Docker Ollama Connection: Already configured with host.docker.internal in docker-compose.yml

  3. Import Errors: Run uv sync to ensure all dependencies are installed

  4. Vector DB Hanging: Skip initialization during development:

    SKIP_VECTOR_DB_INGESTION=true uv run streamlit run src/chatbot-ui/streamlit_app.py
  5. Multiple Weave Traces: Fixed with session state management

Recent Improvements

Enhanced Streamlit UI (v0.6.0)

  • New Feature: Professional tab-based interface architecture
  • Smart Query Features: Auto-suggestions, query history, and intelligent filters
  • Real-Time Monitoring: Performance metrics, RAG analytics, and system health dashboard
  • Enhanced Response Display: Context cards, structured information, and query analysis
  • Improved UX: Organized configuration, categorized examples, and responsive design

Weave Tracing Optimization (v0.5.0)

  • Issue Resolved: Eliminated multiple/redundant Weave trace calls
  • Root Cause: Improper interaction between Streamlit caching and Weave decorators
  • Solution: Session state initialization + consolidated trace entry points
  • Result: Clean, meaningful traces with zero redundancy

Enhanced Weave-RAGAS Integration (v0.7.0)

  • TOP PRIORITY Achievement: All RAGAS evaluation metrics now fully visible in Weave UI
  • Enhanced Implementation: Created enhanced_weave_ragas.py with comprehensive metric tracking
  • Three Evaluation Modes: Single query, full dataset, and comparison evaluations
  • Complete Metric Visibility: All 8 RAGAS metrics (faithfulness, relevancy, precision, recall, etc.) tracked individually
  • Performance Monitoring: Latency tracking for retrieval and generation phases
  • Drill-Down Capabilities: Click any example in Weave UI to see full details and scores
  • Comparison Views: Easy A/B testing with automatic leaderboard creation
  • Documentation: Complete guide in docs/guides/ENHANCED_WEAVE_RAGAS_GUIDE.md

LiteLLM Integration Fixes (v0.6.1)

  • Ollama Import Error: Removed direct ollama import, using LiteLLM's built-in support instead
  • RAG Processor Initialization: Fixed query_patterns initialization when using existing vector database
  • LLM Service Interface: Updated streamlit app to use chat() method instead of generate() for proper message handling
  • Weave Tracing: Removed redundant @weave.op() decorator from generate method to prevent argument mismatch errors
  • Result: Seamless LiteLLM integration with support for 100+ providers including Ollama

LangGraph Agent Implementation (v0.9.0 - Sprint 3)

  • ReAct Pattern Agent: Fully functional reasoning-action-observation loop with LangGraph
  • Tool Integration: Vector search wrapped as agent tool maintaining RAG capabilities
  • Conversation Persistence: PostgreSQL-backed state management for multi-turn conversations
  • Session Management: UUID-based session and thread tracking with UI integration
  • Reasoning Transparency: Expandable reasoning traces in Streamlit interface
  • API Enhancement: New /api/v1/agent/query endpoint with full agent capabilities
  • Backward Compatibility: Existing RAG endpoints preserved, agent mode is optional toggle

Documentation

This project includes comprehensive documentation to help you understand and work with the system:

Project roadmap and task tracking

  • Complete project overview and goals
  • Sprint 0 and Sprint 1 deliverables with detailed task breakdowns
  • EDA findings and dataset analysis summary
  • Configuration features and tracing implementation status
  • Success criteria and architecture decisions

Sprint 0 foundation summary

  • Foundational components completed (June 28, 2025)
  • Data processing pipeline, LLM configuration, monitoring setup
  • Project setup, environment configuration, and architecture planning
  • Technical achievements and development infrastructure
  • Foundation established for RAG implementation

Sprint 1 RAG prototype implementation

  • Complete RAG system implementation following course requirements
  • Vector database setup, basic RAG pipeline, instrumentation, and evaluation
  • All 4 instructor-specified tasks completed (Lessons 3-6)
  • Advanced features beyond scope: query intelligence, dual-architecture, synthetic data
  • W&B integration with comprehensive evaluation framework

Sprint 2 production readiness

  • Complete production implementation with FastAPI REST API
  • Hybrid retrieval with BM25 and Reciprocal Rank Fusion
  • Structured outputs using Instructor library with Pydantic models
  • Jinja2 prompt management system with template registry
  • 108+ tests with 91% coverage and 60-96% performance improvements

Sprint 3 LangGraph agent

  • Transformed RAG system into intelligent conversational agent using LangGraph
  • Implemented ReAct pattern with reasoning, action, and observation nodes
  • Created vector search tool wrapping existing RAG functionality
  • Added PostgreSQL persistence for multi-turn conversation support
  • Session management with UUID-based tracking
  • Agent mode toggle in Streamlit UI with reasoning trace visibility
  • New /api/v1/agent/query endpoint for agent interactions

Complete ChromaDB integration guide

  • GTE-large embedding model implementation details
  • Data loading process and timeline details
  • Search capabilities and metadata schema
  • Performance monitoring and logging
  • Troubleshooting guide and best practices
  • API reference and usage examples

Local development vs Docker production comparison

  • Dual-architecture approach explanation (vector_db.py vs vector_db_docker.py)
  • Embedding strategy differences (GTE-large vs ChromaDB default)
  • Connection architecture and storage configuration details
  • Performance comparison and resource usage analysis
  • Use case guidelines and migration considerations
  • Troubleshooting and best practices for both environments

Comprehensive LLM tracing and monitoring guide

  • Complete Weave integration implementation details
  • Configuration parameter tracking (temperature, max_tokens, top_p, top_k)
  • W&B dashboard setup and trace analysis
  • Provider-specific handling and error resilience
  • Performance monitoring and debugging techniques
  • Troubleshooting guide for common tracing issues

RAG evaluation framework documentation

  • Industry-standard RAGAS evaluation framework with enhanced Weave integration
  • Core RAGAS metrics: Faithfulness, Answer Relevancy, Context Precision, Context Recall
  • Additional metrics: Context Utilization, Answer Correctness, Similarity, Completeness
  • Complete visibility of all metrics in Weave UI with drill-down capabilities
  • Command-line interface for single query, dataset, and comparison evaluations
  • Performance tracking with retrieval and generation latency monitoring

Synthetic test data generation guide

  • Advanced synthetic data generation with template-based queries and variation techniques
  • Configurable generation parameters: difficulty distribution, query types, and variation methods
  • Quality validation tools: uniqueness analysis, length distribution, and topic coverage
  • Weave integration for full traceability and performance monitoring
  • Mixed dataset creation combining original and synthetic data for robust testing
  • Best practices implementation and troubleshooting guide

Enhanced Weave-RAGAS integration guide

  • Complete guide for using enhanced evaluation with full metric visibility in Weave UI
  • Three evaluation modes: single query, dataset, and comparison evaluations
  • All 8 RAGAS metrics tracked individually with proper attribution
  • Performance monitoring for retrieval and generation phases
  • Step-by-step instructions and command examples
  • Troubleshooting and best practices for production use

Dashboard metrics interpretation & implementation guide

  • Comprehensive documentation of all monitoring dashboard metrics
  • Session statistics including conversation balance logic and message ratio handling
  • Performance monitoring with provider-specific tracking and comparison
  • RAG system metrics including vector performance and context quality
  • Business intelligence integration with user journey analytics
  • Configuration status monitoring and system health indicators
  • Implementation details and troubleshooting guidelines

Google Gemini integration guide

  • Complete Google GenAI client message formatting requirements
  • Role conversion and content validation for Gemini compatibility
  • Error resolution for INVALID_ARGUMENT and empty message parts
  • Performance monitoring and provider-specific baselines
  • Integration with Enhanced Tracing v2.0 system
  • Troubleshooting guide and best practices

Containerized deployment compatibility guide

  • Docker TTY issues and solutions for production deployment
  • Non-root user configuration and security best practices
  • Streamlit headless configuration for container environments
  • Weave tracing compatibility in containerized setups
  • Complete verification steps and troubleshooting

Weave-native evaluation using official best practices

  • Proper Model and Dataset versioning with Weave
  • Built-in metric aggregation and comparison views
  • Custom scorer implementation patterns
  • Native UI integration for evaluation results
  • Migration guide from custom implementations

AI assistant development log

  • Detailed record of changes and improvements made by the AI assistant
  • Implementation decisions and technical explanations
  • Feature development timeline and reasoning
  • Code modifications and their rationale

These documents provide in-depth technical guidance beyond the quick start instructions in this README, covering advanced topics like monitoring, containerization, and project management.

Data Sources & Citations

This project uses data from the Amazon Reviews 2023 dataset:

@article{hou2024bridging,
  title={Bridging Language and Items for Retrieval and Recommendation},
  author={Hou, Yupeng and Li, Jiacheng and He, Zhankui and Yan, An and Chen, Xiusi and McAuley, Julian},
  journal={arXiv preprint arXiv:2403.03952},
  year={2024}
}

Contributing

This is a capstone project for educational purposes. Feel free to explore, learn, and adapt the code for your own projects.

License

This project is licensed under the terms specified in the LICENSE file.