AI-Powered Amazon Product Assistant (B2C only)

An end-to-end AI engineering project that builds an intelligent product recommendation and analysis system using Amazon Electronics dataset with a complete RAG implementation. This capstone project demonstrates modern AI engineering practices including data processing, visualization, vector databases, and retrieval-augmented generation (RAG).

Course: End-to-End AI Engineering Bootcamp (Maven)

Features

Data Processing Pipeline: Automated processing of large-scale Amazon product and review data
Interactive Visualizations: Comprehensive analysis dashboards with temporal trends, category insights, and rating patterns
Complete RAG System: Vector database with ChromaDB, intelligent query processing, and context-aware retrieval
Advanced Streamlit UI: Professional tab-based interface with smart query suggestions, real-time monitoring, and enhanced response visualization
Multi-Provider Support: Compatible with OpenAI, Groq, and Google Gemini models
Vector Database: ChromaDB-powered semantic search with GTE-large embeddings, metadata filtering and hybrid queries
Query Intelligence: Automatic query type detection for product reviews, comparisons, complaints, and recommendations
RAG Evaluation Framework: Industry-standard RAGAS evaluation with enhanced Weave integration for complete metric visibility
Enhanced Weave-RAGAS Integration: All evaluation metrics (faithfulness, relevancy, precision, recall) visible in Weave UI with drill-down capabilities
Synthetic Test Data: Advanced synthetic data generation with template-based queries, variation techniques, and quality analysis
Production Testing: Automated test case generation with configurable difficulty distributions and Weave traceability
Optimized Weave Tracing: Production-ready AI pipeline monitoring with efficient session-based initialization, zero-redundancy design, and comprehensive analytics
LiteLLM Integration: Unified access to 100+ LLM providers including Ollama for local models
Vector Database Management: Scripts for reinitializing and managing ChromaDB with custom JSONL data
LangGraph Agent: ReAct pattern conversational agent with reasoning traces, tool use, and persistent state
Session Management: PostgreSQL-based conversation persistence for multi-turn interactions
Agent Mode Toggle: Seamless switching between direct RAG and agent-mediated queries

Out-of-Scope (B2B Scope)

contractual pricing
account-specific catalogs
procurement compliance
multi-user workflows (approvers, requisitioners, etc.)
Bulk ordering, BOM-style inputs, or quote-based negotiation are not captured
ERP integration, punchout catalogs (OCI, cXML)
product taxonomies (e.g. ETIM, UNSPSC)

Dataset Overview

Source: Amazon Reviews 2023 - Electronics Category

Products: 1,000 carefully selected electronics products
Reviews: 20,000 customer reviews (10-20 reviews per product)
Date Range: 2003-2023 (20 years of review data)
Categories: Comprehensive electronics categories with hierarchical structure

Key Statistics

Average reviews per product: 20
Review rating distribution: 4.2/5.0 average
Most active day: Tuesday (3,068 reviews)
Most active month: January (2,283 reviews)
Recent activity: 37.8% of reviews from 2020 onwards
Embedding Model: GTE-large (1024 dimensions) for superior semantic search

Setup & Installation

Prerequisites

Python 3.12+
uv package manager
Docker (optional, for containerized deployment)
Ollama (optional, for local LLM models)

Quick Start

Clone the repository

git clone <repository-url>
cd AI-Powered-Amazon-Product-Assistant

Install dependencies
```
uv sync
```

Configure environment variables

# Create .env file with your API keys
cp .env.example .env  # if available, or create manually

# Required for chatbot functionality
echo "OPENAI_API_KEY=your_openai_key" >> .env
echo "GROQ_API_KEY=your_groq_key" >> .env  
echo "GOOGLE_API_KEY=your_google_key" >> .env

# Optional for Weave tracing
echo "WANDB_API_KEY=your_wandb_key" >> .env

# Optional for Ollama (local LLMs)
echo "OLLAMA_BASE_URL=http://localhost:11434" >> .env

Set up Jupyter kernel

uv run python -m ipykernel install --user --name ai-product-assistant

Run data processing (if needed)

uv run jupyter notebook notebooks/data_preprocessing.ipynb

Launch applications

# Visualization dashboard
uv run jupyter notebook notebooks/data_visualization.ipynb

# Enhanced Streamlit chatbot interface with tab-based UI and RAG
uv run streamlit run src/chatbot-ui/streamlit_app.py
# OR use Make
make run-streamlit

# Run FastAPI server with agent endpoint (required for agent mode)
make run-api

# Optional: Run with PostgreSQL for conversation persistence
docker-compose -f docker-compose.postgres.yml up -d
make run-api  # Will automatically detect and use PostgreSQL

# Run Weave-native evaluation (RECOMMENDED - follows official best practices)
uv run python scripts/eval/run_weave_native_evaluation.py --dataset-path "data/evaluation/rag_evaluation_dataset.json" --openai-api-key YOUR_KEY

# Run model comparison with native evaluation
uv run python scripts/eval/run_weave_native_evaluation.py --mode comparison --dataset-path "data/evaluation/rag_evaluation_dataset.json"

# Run enhanced Weave-RAGAS evaluation (ensures all metrics visible in Weave UI)
uv run python scripts/eval/run_enhanced_evaluation.py --single-query "What are iPhone charger features?" --wandb-api-key YOUR_KEY

# Run full evaluation with complete metric tracking
uv run python scripts/eval/run_enhanced_evaluation.py --dataset-path "data/evaluation/rag_evaluation_dataset.json" --wandb-api-key YOUR_KEY

# Alternative: Standard RAGAS evaluation
uv run python scripts/eval/run_ragas_evaluation.py --single-query "What are iPhone charger features?" --ground-truth "iPhone chargers typically feature Lightning connector, fast charging support, USB-C power adapter compatibility, and MFi certification"

# Generate ragas test dataset (Note: If you get entity extraction errors, see CLAUDE.md)
uv run python scripts/eval/generate_ragas_dataset.py --test-size 50
# Alternative: Generate simple synthetic dataset
uv run python scripts/eval/generate_simple_ragas_dataset.py --synthetic-only

Docker Deployment (with ChromaDB)

Note: ChromaDB is an API service and doesn't have a web interface. To interact with your data, use the Streamlit app at http://localhost:8501.

# Build the containers
make build-docker-streamlit

# Run both Streamlit app and ChromaDB service
make run-docker-streamlit

# View logs
make logs-docker-streamlit

# Stop services
make stop-docker-streamlit

# Restart services
make restart-docker-streamlit

Docker Services:

Streamlit App: http://localhost:8501 (Enhanced tab-based interface)
ChromaDB API: http://localhost:8000 (API service - no web UI)
- Health check: curl http://localhost:8000/api/v2/heartbeat
- Collections: curl http://localhost:8000/api/v2/collections
Persistent Storage: Vector data persisted in Docker volume

Enhanced Streamlit Interface

The application features a professional tab-based interface designed for optimal user experience:

🔧 Configuration Tab:

System Status: Real-time monitoring of Weave tracing and RAG system initialization
Model Selection: Choose from OpenAI (GPT-4o, GPT-4o-mini), Groq (Llama-3.3-70b), or Google (Gemini-2.0-flash)
Parameter Controls: Fine-tune temperature, max tokens, top-p, and top-k with provider-specific support
RAG Configuration: Enable/disable RAG with customizable product and review limits

💬 Query Tab:

Smart Examples: 12+ categorized example queries across 6 use cases (Product Info, Reviews, Comparisons, Complaints, Recommendations, Use Cases)
Query History: Access and reuse your last 10 queries with one click
Auto-Suggestions: Get intelligent query completions based on partial input (3+ characters)
Quick Filters: Filter by query type, product category, and price range
Enhanced Input: Dynamic placeholders and integrated filter display

📊 Monitoring Tab:

Session Statistics: Track message counts, query history, and usage patterns
Real-Time Performance: View RAG vs LLM processing times with percentage breakdown
RAG Analytics: Monitor retrieved products/reviews and query type detection
System Health: Check API configurations and system component status
Weave Integration: Direct links to W&B dashboard for detailed trace analysis

Enhanced Weave Tracing Setup

The application includes comprehensive Weave tracing for end-to-end AI pipeline monitoring and performance analysis.

Get W&B API Key
- Sign up at wandb.ai
- Get your API key from User Settings

Configure Tracing

# Add to your .env file
echo "WANDB_API_KEY=your_wandb_api_key" >> .env

Enhanced Features Tracked
- Optimized Initialization: Single-session setup with session state management
- RAG Pipeline Tracing: Query analysis, context building, and retrieval metrics
- LLM Provider Tracking: Detailed request/response metadata for OpenAI, Groq, and Google
- Performance Analytics: Sub-operation timing, character counts, and success rates
- Error Classification: Structured error handling with types and fallback strategies
- Real-Time UI Feedback: Processing times and operation status in sidebar
- Context Quality Metrics: Query type detection, extracted terms, and retrieval effectiveness
- Trace Optimization: Eliminated redundant calls and duplicate initialization
Optimized Operation Monitoring
- Session-Based Initialization: Single setup per session via @st.cache_resource
- Consolidated Tracing: Primary trace points at key pipeline stages
- RAG Enhancement Metrics: Query processing timing and context quality
- LLM Provider Analytics: Request/response data with performance breakdown
- End-to-End Pipeline: Complete timing analysis from query to response
- Zero-Redundancy Design: Eliminated multiple trace calls for same operations
Production-Ready Monitoring
- Optimized Trace Volume: Meaningful traces without duplication
- Session State Management: Prevents repeated initialization calls
- Clean Dashboard Data: Visit your W&B dashboard for organized traces
- Performance Insights: Navigate to "Bootcamp" project for analytics
- Error Tracking: Structured error handling with fallback strategies
- Real-Time Feedback: Processing times displayed in Streamlit sidebar

Vector Database Management

The project includes scripts for managing and reinitializing the ChromaDB vector database:

# Check current database status
uv run python scripts/check_vector_db.py

# Reinitialize with your own JSONL data (simple)
uv run python scripts/reinit_vector_db_simple.py your_data.jsonl --clear

# Reinitialize with advanced options
uv run python scripts/reinit_vector_db.py \
  --jsonl-path your_data.jsonl \
  --batch-size 50 \
  --persist-dir custom_db \
  --collection-name my_collection

# Append new data without clearing
uv run python scripts/reinit_vector_db.py \
  --jsonl-path additional_data.jsonl \
  --no-clear-existing

Supported JSONL formats:

Standard RAG format: {"id": "...", "text": "...", "type": "product|review", "metadata": {...}}
Amazon format: {"asin": "...", "title": "...", "description": "...", "reviewText": "..."}
Generic format: {"content": "...", "category": "...", "source": "..."}

For detailed documentation, see scripts/README_vector_db.md.

Ollama Local LLM Support

The application supports local LLMs through Ollama via LiteLLM:

# Install Ollama (visit https://ollama.com for instructions)

# Pull and run a model
ollama pull llama3.2
ollama run llama3.2

# The app will automatically detect Ollama at http://localhost:11434
# Select "Ollama" as the provider in the Streamlit configuration tab

Note for Docker users: When running the Streamlit app in Docker, Ollama running on your host machine is accessible via host.docker.internal:11434. This is automatically configured in the docker-compose.yml file.

Project Structure

AI-Powered-Amazon-Product-Assistant/
├── 📁 data/
│   ├── Electronics.jsonl                              # Raw review data (25GB)
│   ├── meta_Electronics.jsonl                         # Raw product metadata (4.9GB)
│   ├── 📁 processed/
│   │   ├── electronics_top1000_products.jsonl         # 1,000 product records
│   │   ├── electronics_top1000_products_reviews.jsonl # 20,000 review records
│   │   ├── electronics_rag_documents.jsonl            # 2,000 RAG-optimized documents
│   │   ├── dataset_summary.json                       # Processing metadata
│   │   └── README.md                                  # Data documentation
│   └── 📁 chroma_db/                                  # Vector database storage (local)
├── 📁 notebooks/
│   ├── data_preprocessing.ipynb                       # High-performance data processing with Polars
│   ├── data_visualization.ipynb                       # Efficient data visualization with Polars
│   ├── verify_api_keys.ipynb                         # API configuration testing
│   └── README.md                                      # Notebook documentation
├── 📁 src/
│   ├── 📁 chatbot-ui/
│   │   ├── 📁 core/
│   │   │   └── config.py                              # Multi-provider configuration
│   │   ├── streamlit_app.py                          # Main chatbot interface with RAG
│   │   └── session_manager.py                        # Session management for agent conversations
│   ├── 📁 core/                                      # Core modules
│   │   ├── __init__.py                               # Core module initialization
│   │   ├── base_classes.py                           # Base abstract classes
│   │   ├── config_improved.py                        # Enhanced configuration (Pydantic V2)
│   │   ├── decorators.py                             # Utility decorators (retry, cache, timing)
│   │   ├── exceptions.py                             # Custom exception hierarchy
│   │   ├── implementations.py                        # Concrete implementations
│   │   ├── llm_providers.py                          # LLM provider management
│   │   ├── llm_service.py                            # LLM service interface
│   │   ├── logging_config.py                         # Logging configuration
│   │   ├── performance.py                            # Performance optimization utilities
│   │   └── structured_outputs.py                     # Pydantic models for structured LLM responses
│   ├── 📁 agents/                                    # LangGraph agent implementation (Sprint 3)
│   │   ├── __init__.py                               # Agent module initialization
│   │   ├── state.py                                  # Agent state TypedDict definitions
│   │   ├── nodes.py                                  # ReAct pattern nodes (reasoning, action, observation)
│   │   ├── graph.py                                  # LangGraph workflow and routing
│   │   ├── react_agent.py                            # Main ReactAgent implementation
│   │   ├── 📁 tools/                                 # Agent tools
│   │   │   ├── __init__.py                           # Tools initialization
│   │   │   └── vector_search_tool.py                 # Vector search tool wrapping RAG
│   │   └── 📁 persistence/                           # State persistence
│   │       ├── __init__.py                           # Persistence initialization
│   │       ├── models.py                             # SQLAlchemy models for state storage
│   │       └── postgres_checkpointer.py              # PostgreSQL checkpointer for conversations
│   ├── 📁 api/                                       # FastAPI implementation (Sprint 2)
│   │   ├── __init__.py                               # API module initialization
│   │   ├── app.py                                    # Main FastAPI application
│   │   ├── dependencies.py                           # Dependency injection
│   │   ├── models.py                                 # Request/response models
│   │   ├── 📁 middleware/                            # API middleware
│   │   │   ├── __init__.py                           # Middleware initialization
│   │   │   ├── rate_limiting.py                      # Rate limiting middleware
│   │   │   ├── cors.py                               # CORS configuration
│   │   │   ├── authentication.py                     # API key authentication
│   │   │   └── error_handling.py                     # Global error handling
│   │   └── 📁 routers/                               # API route handlers
│   │       ├── __init__.py                           # Routers initialization
│   │       ├── health.py                             # Health check endpoints
│   │       └── rag.py                                # RAG and agent endpoints
│   ├── 📁 monitoring/                                # Monitoring and observability
│   │   └── integration.py                            # Monitoring system integration
│   ├── 📁 prompts/                                   # Prompt management (Sprint 2)
│   │   ├── __init__.py                               # Prompts module initialization
│   │   ├── registry.py                               # Prompt template registry
│   │   ├── filters.py                                # Custom Jinja2 filters
│   │   └── templates/                                # Jinja2 templates for all query types
│   ├── 📁 rag/
│   │   ├── vector_db.py                               # ChromaDB vector database (local, GTE-large)
│   │   ├── vector_db_docker.py                       # ChromaDB vector database (Docker, optimized)
│   │   ├── query_processor.py                        # RAG query processing (auto-selects implementation)
│   │   ├── hybrid_retrieval.py                       # BM25 and hybrid search implementation (Sprint 2)
│   │   └── 📁 experimental/                          # Experimental implementations for reference
│   │       ├── vector_db_improved.py                  # Best practices reference implementation
│   │       ├── vector_db_migrated.py                  # Factory pattern implementation
│   │       └── vector_db_optimized.py                # Performance optimization reference
│   ├── 📁 evaluation/
│   │   ├── __init__.py                                # Evaluation module interface
│   │   ├── rag_adapter.py                             # RAG system adapter for ragas framework
│   │   ├── ragas_evaluator.py                         # Main RAG evaluator using ragas
│   │   ├── ragas_reporter.py                          # HTML report generation for ragas results
│   │   ├── weave_ragas_evaluator.py                  # Basic Weave-RAGAS integration
│   │   ├── enhanced_weave_ragas.py                   # Enhanced Weave-RAGAS with full metric visibility
│   │   ├── weave_native_evaluation.py                # Weave-native evaluation (best practices)
│   │   ├── dataset.py                                 # Evaluation dataset creation and management
│   │   └── synthetic_data_generator.py               # Advanced synthetic test data generation
│   └── 📁 tracing/
│       ├── business_intelligence.py                   # Business intelligence tracking
│       └── trace_utils.py                            # Tracing utilities and helpers
├── 📁 tests/                                         # Test suite
│   ├── __init__.py                                   # Test module initialization
│   ├── conftest.py                                   # Pytest configuration and fixtures
│   ├── test_basic.py                                 # Basic test suite functionality
│   ├── test_infrastructure.py                        # Infrastructure test verification
│   ├── 📁 unit/                                      # Unit tests
│   │   ├── test_config.py                            # Configuration tests (Pydantic V2)
│   │   ├── test_decorators.py                        # Decorator functionality tests
│   │   ├── test_error_handling.py                    # Error handling tests
│   │   ├── test_litellm_service.py                   # LiteLLM service tests
│   │   ├── test_query_processor.py                   # Query processor tests
│   │   ├── test_vector_db.py                         # Vector database tests
│   │   └── test_vector_db_migrated.py               # Migrated vector DB tests
│   ├── 📁 integration/                               # Integration tests
│   │   ├── test_chatbot_e2e.py                       # End-to-end chatbot tests
│   │   ├── test_monitoring_integration.py            # Monitoring integration tests
│   │   └── test_rag_pipeline.py                      # RAG pipeline tests
│   └── 📁 fixtures/                                  # Test fixtures and mock data
├── 📁 scripts/                                       # UV scripts and utilities
│   ├── 📁 eval/                                      # Evaluation runner scripts
│   │   ├── run_enhanced_evaluation.py                # Enhanced Weave-RAGAS evaluation (RECOMMENDED)
│   │   ├── run_ragas_evaluation.py                   # Standard RAGAS evaluation
│   │   ├── run_weave_native_evaluation.py            # Weave-native evaluation
│   │   ├── run_weave_ragas_evaluation.py             # Basic Weave-RAGAS (DEPRECATED)
│   │   ├── generate_ragas_dataset.py                 # Generate ragas test datasets
│   │   └── generate_simple_ragas_dataset.py          # Simplified dataset generation
│   ├── validate_config.py                            # Configuration validation script
│   ├── run_streamlit.py                              # UV script to run Streamlit app
│   ├── run_api_server.py                             # UV script to run FastAPI server
│   ├── lint.py                                       # UV script for code linting
│   ├── format.py                                     # UV script for code formatting
│   ├── clean_notebooks.py                            # UV script to clean notebook outputs
│   ├── list_scripts.py                               # List all available UV scripts
│   ├── check_vector_db.py                            # Check vector database status
│   ├── reinit_vector_db.py                           # Reinitialize vector database
│   ├── reinit_vector_db_simple.py                    # Simple vector database reinitialization
│   ├── init-postgres.sql                             # PostgreSQL schema initialization (Sprint 3)
│   └── test_agent_simple.py                          # Simple agent testing script
├── 📁 examples/
│   └── synthetic_data_examples.py                    # Synthetic data usage demonstrations
├── 📁 docs/                                          # Technical documentation
│   ├── 📁 architecture/                              # System design documents
│   │   ├── CHROMA.md                                 # ChromaDB integration guide
│   │   ├── LOCAL_VS_DOCKER.md                        # Local vs Docker implementation comparison
│   │   └── DASHBOARD_METRICS.md                      # Dashboard metrics interpretation
│   ├── 📁 guides/                                    # How-to guides
│   │   ├── WEAVE_TRACING_GUIDE.md                    # LLM tracing & monitoring guide
│   │   ├── EVALUATIONS.md                            # RAG evaluation framework documentation
│   │   ├── SYNTHETIC_DATA.md                         # Synthetic test data generation guide
│   │   ├── GEMINI_MESSAGE_HANDLING.md                # Google Gemini integration guide
│   │   ├── DOCKER_TTY_FIXES.md                       # Container deployment fixes
│   │   ├── MONITORING_GUIDE.md                       # System monitoring setup
│   │   ├── PERFORMANCE_OPTIMIZATIONS.md              # Performance optimization guide
│   │   └── ENHANCED_WEAVE_RAGAS_GUIDE.md             # Enhanced Weave-RAGAS integration guide
│   ├── 📁 sprints/                                   # Sprint documentation
│   │   ├── SPRINT_0.md                               # Sprint 0 foundation summary
│   │   ├── SPRINT_1.md                               # Sprint 1 RAG implementation summary
│   │   ├── SPRINT_2.md                               # Sprint 2 production readiness summary
│   │   ├── SPRINT_3.md                               # Sprint 3 LangGraph agent summary
│   │   └── SPRINT_3_IMPLEMENTATION.md                # Sprint 3 detailed implementation guide
│   ├── 📁 testing/                                   # Testing documentation
│   ├── 📁 development/                               # Development process docs
│   └── 📁 planning/                                  # Vision and planning docs
├── 📄 pyproject.toml                                  # uv dependencies & config
├── 📄 docker-compose.yml                              # Multi-service container setup
├── 📄 docker-compose.postgres.yml                     # Extended Docker config with PostgreSQL (Sprint 3)
├── 📄 Dockerfile                                      # Container deployment
├── 📄 docker-entrypoint.sh                           # Container initialization script
├── 📄 Makefile                                        # Build automation (Docker & shell commands)
├── 📄 PROJECT_CANVAS.md                               # Project roadmap & tasks
├── 📄 CLAUDE.md                                       # AI assistant development log
└── 📄 README.md                                       # Project documentation

Data Processing Pipeline

The project includes a comprehensive data processing pipeline:

Raw Data Ingestion: Processes large JSONL files from Amazon Reviews 2023
Product Selection: Intelligently selects top 1000 products based on review volume and quality
Review Sampling: Extracts representative reviews for each product
Data Cleaning: Handles missing values, validates data integrity
RAG Optimization: Formats data for retrieval-augmented generation systems
Vector Database Creation: Automatic ingestion into ChromaDB with embeddings and metadata
Query Processing: Intelligent context retrieval based on query type and intent

Visualization Capabilities

The visualization notebook provides comprehensive insights:

Review Distribution Analysis: Product popularity and rating patterns
Price Analysis: Price ranges and correlation with ratings
Category Analysis: Hierarchical category exploration
Store & Brand Analysis: Top performers and market distribution
Temporal Analysis: Review trends over time (2003-2023)
Text Analysis: Review length and content characteristics

Technical Stack

Data Processing: pandas, numpy, json, Polars (high-performance alternative)
Visualization: matplotlib, seaborn, plotly
Vector Database: Dual-architecture ChromaDB system (local: GTE-large, Docker: optimized)
Embedding Models: GTE-large (development) and ChromaDB default (production) with automatic selection
RAG Implementation: Custom query processing with intelligent context retrieval and environment detection
Agent Framework: LangGraph for ReAct pattern agent with tool use (Sprint 3)
State Persistence: PostgreSQL with SQLAlchemy for conversation management (Sprint 3)
API Framework: FastAPI with middleware, routers, and dependency injection (Sprint 2)
Structured Outputs: Instructor library with Pydantic models (Sprint 2)
Prompt Management: Jinja2 templating system with registry (Sprint 2)
Notebook Environment: Jupyter, IPython, Marimo (reactive notebooks)
Package Management: uv (modern Python package manager)
Web Interface: Professional Streamlit UI with tab-based architecture, smart query suggestions, and real-time monitoring
LLM Providers: OpenAI GPT-4o, Groq Llama, Google Gemini 2.0, Ollama (100+ via LiteLLM)
Monitoring: Optimized Weave tracing via Weights & Biases with session state management
Configuration: Pydantic V2 settings with environment variables
Testing: Pytest with 108+ tests, 91% coverage
Containerization: Docker with non-root security, Docker Compose for multi-service deployment

Usage Examples

Using Agent Mode

Start Required Services:

# Start API server (required for agent mode)
make run-api

# In another terminal, start Streamlit
make run-streamlit

Enable Agent Mode:
- Go to the Configuration tab
- Enable "Enable RAG (Product Search)"
- Toggle "🤖 Enable Agent Mode (ReAct)"
Ask Questions:
- The agent will process queries with reasoning steps
- View reasoning trace in expandable "🤔 Agent Reasoning Steps" section
- Session info displayed in sidebar
Example Queries:
- "What are the main complaints about laptop backpacks?"
- "Compare iPhone and Android chargers"
- "Find budget tablets under $200 with good reviews"

Data Processing

# Load processed data
import pandas as pd
import json

# Load products
products = []
with open('data/processed/electronics_top1000_products.jsonl', 'r') as f:
    for line in f:
        products.append(json.loads(line.strip()))

df_products = pd.DataFrame(products)
print(f"Loaded {len(df_products)} products")

RAG System

# Test RAG system
from src.rag.query_processor import create_rag_processor

# Initialize processor
processor = create_rag_processor()

# Process a query
result = processor.process_query("What do people say about iPhone charger cables?")
print(f"Found {result['metadata']['num_products']} products and {result['metadata']['num_reviews']} reviews")

Enhanced Weave-RAGAS Evaluation

# Run enhanced evaluation with full Weave visibility
from src.evaluation.enhanced_weave_ragas import create_enhanced_evaluator
import asyncio

# Create enhanced evaluator
model, evaluator = create_enhanced_evaluator(
    project_name="my-rag-evaluation",
    openai_api_key="your_key"
)

# Run single evaluation
async def evaluate():
    result = await evaluator.evaluate_example(
        model=model,
        question="What are iPhone charger features?",
        ground_truth="iPhone cables feature Lightning connectors..."
    )
    print(f"Overall Score: {result['overall_score']:.3f}")
    print(f"Metrics: {result['metrics']}")

asyncio.run(evaluate())

Weave-Native Evaluation (Best Practices)

# Use Weave's native evaluation framework
from src.evaluation.weave_native_evaluation import create_rag_model, create_native_evaluator
import asyncio

# Create model and evaluator
model = create_rag_model(model_name="rag-v1", temperature=0.7)
evaluator = create_native_evaluator(project_name="rag-eval")

# Run evaluation
async def native_evaluate():
    # Create dataset
    dataset = evaluator.create_dataset(
        [{"query": "What are iPhone features?", "expected_answer": "..."}],
        name="test_dataset"
    )
    
    # Run evaluation
    results = await evaluator.evaluate_model(
        model=model,
        dataset=dataset,
        evaluation_name="Baseline Test"
    )

asyncio.run(native_evaluate())

Synthetic Test Data Generation

# Generate synthetic evaluation data
from src.evaluation.synthetic_data_generator import create_synthetic_dataset, SyntheticDataConfig

# Custom configuration
config = SyntheticDataConfig(
    num_examples_per_category=5,
    difficulty_distribution={"easy": 0.3, "medium": 0.5, "hard": 0.2},
    variation_techniques=["rephrase", "specificity", "context"]
)

# Generate synthetic examples
synthetic_examples = create_synthetic_dataset(config, num_examples=30)
print(f"Generated {len(synthetic_examples)} synthetic test cases")

# Create mixed dataset (original + synthetic)
from src.evaluation.synthetic_data_generator import create_mixed_dataset
original_examples = create_evaluation_dataset()
mixed_dataset = create_mixed_dataset(original_examples, synthetic_ratio=0.5)

Visualization

# Generate temporal analysis
from notebooks.data_visualization import temporal_analysis
temporal_analysis(df_reviews)

Troubleshooting

For detailed solutions to common issues, see docs/TROUBLESHOOTING.md.

Quick Fixes

Ragas Entity Extraction Error: Use the simple generator:

uv run python scripts/eval/generate_simple_ragas_dataset.py --synthetic-only

Docker Ollama Connection: Already configured with host.docker.internal in docker-compose.yml
Import Errors: Run uv sync to ensure all dependencies are installed

Vector DB Hanging: Skip initialization during development:

SKIP_VECTOR_DB_INGESTION=true uv run streamlit run src/chatbot-ui/streamlit_app.py

Multiple Weave Traces: Fixed with session state management

Recent Improvements

Enhanced Streamlit UI (v0.6.0)

New Feature: Professional tab-based interface architecture
Smart Query Features: Auto-suggestions, query history, and intelligent filters
Real-Time Monitoring: Performance metrics, RAG analytics, and system health dashboard
Enhanced Response Display: Context cards, structured information, and query analysis
Improved UX: Organized configuration, categorized examples, and responsive design

Weave Tracing Optimization (v0.5.0)

Issue Resolved: Eliminated multiple/redundant Weave trace calls
Root Cause: Improper interaction between Streamlit caching and Weave decorators
Solution: Session state initialization + consolidated trace entry points
Result: Clean, meaningful traces with zero redundancy

Enhanced Weave-RAGAS Integration (v0.7.0)

TOP PRIORITY Achievement: All RAGAS evaluation metrics now fully visible in Weave UI
Enhanced Implementation: Created enhanced_weave_ragas.py with comprehensive metric tracking
Three Evaluation Modes: Single query, full dataset, and comparison evaluations
Complete Metric Visibility: All 8 RAGAS metrics (faithfulness, relevancy, precision, recall, etc.) tracked individually
Performance Monitoring: Latency tracking for retrieval and generation phases
Drill-Down Capabilities: Click any example in Weave UI to see full details and scores
Comparison Views: Easy A/B testing with automatic leaderboard creation
Documentation: Complete guide in docs/guides/ENHANCED_WEAVE_RAGAS_GUIDE.md

LiteLLM Integration Fixes (v0.6.1)

Ollama Import Error: Removed direct ollama import, using LiteLLM's built-in support instead
RAG Processor Initialization: Fixed query_patterns initialization when using existing vector database
LLM Service Interface: Updated streamlit app to use chat() method instead of generate() for proper message handling
Weave Tracing: Removed redundant @weave.op() decorator from generate method to prevent argument mismatch errors
Result: Seamless LiteLLM integration with support for 100+ providers including Ollama

LangGraph Agent Implementation (v0.9.0 - Sprint 3)

ReAct Pattern Agent: Fully functional reasoning-action-observation loop with LangGraph
Tool Integration: Vector search wrapped as agent tool maintaining RAG capabilities
Conversation Persistence: PostgreSQL-backed state management for multi-turn conversations
Session Management: UUID-based session and thread tracking with UI integration
Reasoning Transparency: Expandable reasoning traces in Streamlit interface
API Enhancement: New /api/v1/agent/query endpoint with full agent capabilities
Backward Compatibility: Existing RAG endpoints preserved, agent mode is optional toggle

Documentation

This project includes comprehensive documentation to help you understand and work with the system:

PROJECT_CANVAS.md

Project roadmap and task tracking

Complete project overview and goals
Sprint 0 and Sprint 1 deliverables with detailed task breakdowns
EDA findings and dataset analysis summary
Configuration features and tracing implementation status
Success criteria and architecture decisions

docs/sprints/SPRINT_0.md

Sprint 0 foundation summary

Foundational components completed (June 28, 2025)
Data processing pipeline, LLM configuration, monitoring setup
Project setup, environment configuration, and architecture planning
Technical achievements and development infrastructure
Foundation established for RAG implementation

docs/sprints/SPRINT_1.md

Sprint 1 RAG prototype implementation

Complete RAG system implementation following course requirements
Vector database setup, basic RAG pipeline, instrumentation, and evaluation
All 4 instructor-specified tasks completed (Lessons 3-6)
Advanced features beyond scope: query intelligence, dual-architecture, synthetic data
W&B integration with comprehensive evaluation framework

docs/sprints/SPRINT_2.md

Sprint 2 production readiness

Complete production implementation with FastAPI REST API
Hybrid retrieval with BM25 and Reciprocal Rank Fusion
Structured outputs using Instructor library with Pydantic models
Jinja2 prompt management system with template registry
108+ tests with 91% coverage and 60-96% performance improvements

docs/sprints/SPRINT_3.md & SPRINT_3_IMPLEMENTATION.md

Sprint 3 LangGraph agent

Transformed RAG system into intelligent conversational agent using LangGraph
Implemented ReAct pattern with reasoning, action, and observation nodes
Created vector search tool wrapping existing RAG functionality
Added PostgreSQL persistence for multi-turn conversation support
Session management with UUID-based tracking
Agent mode toggle in Streamlit UI with reasoning trace visibility
New /api/v1/agent/query endpoint for agent interactions

docs/architecture/CHROMA.md

Complete ChromaDB integration guide

GTE-large embedding model implementation details
Data loading process and timeline details
Search capabilities and metadata schema
Performance monitoring and logging
Troubleshooting guide and best practices
API reference and usage examples

docs/architecture/LOCAL_VS_DOCKER.md

Local development vs Docker production comparison

Dual-architecture approach explanation (vector_db.py vs vector_db_docker.py)
Embedding strategy differences (GTE-large vs ChromaDB default)
Connection architecture and storage configuration details
Performance comparison and resource usage analysis
Use case guidelines and migration considerations
Troubleshooting and best practices for both environments

docs/guides/WEAVE_TRACING_GUIDE.md

Comprehensive LLM tracing and monitoring guide

Complete Weave integration implementation details
Configuration parameter tracking (temperature, max_tokens, top_p, top_k)
W&B dashboard setup and trace analysis
Provider-specific handling and error resilience
Performance monitoring and debugging techniques
Troubleshooting guide for common tracing issues

docs/guides/EVALUATIONS.md

RAG evaluation framework documentation

Industry-standard RAGAS evaluation framework with enhanced Weave integration
Core RAGAS metrics: Faithfulness, Answer Relevancy, Context Precision, Context Recall
Additional metrics: Context Utilization, Answer Correctness, Similarity, Completeness
Complete visibility of all metrics in Weave UI with drill-down capabilities
Command-line interface for single query, dataset, and comparison evaluations
Performance tracking with retrieval and generation latency monitoring

docs/guides/SYNTHETIC_DATA.md

Synthetic test data generation guide

Advanced synthetic data generation with template-based queries and variation techniques
Configurable generation parameters: difficulty distribution, query types, and variation methods
Quality validation tools: uniqueness analysis, length distribution, and topic coverage
Weave integration for full traceability and performance monitoring
Mixed dataset creation combining original and synthetic data for robust testing
Best practices implementation and troubleshooting guide

docs/guides/ENHANCED_WEAVE_RAGAS_GUIDE.md

Enhanced Weave-RAGAS integration guide

Complete guide for using enhanced evaluation with full metric visibility in Weave UI
Three evaluation modes: single query, dataset, and comparison evaluations
All 8 RAGAS metrics tracked individually with proper attribution
Performance monitoring for retrieval and generation phases
Step-by-step instructions and command examples
Troubleshooting and best practices for production use

docs/architecture/DASHBOARD_METRICS.md

Dashboard metrics interpretation & implementation guide

Comprehensive documentation of all monitoring dashboard metrics
Session statistics including conversation balance logic and message ratio handling
Performance monitoring with provider-specific tracking and comparison
RAG system metrics including vector performance and context quality
Business intelligence integration with user journey analytics
Configuration status monitoring and system health indicators
Implementation details and troubleshooting guidelines

docs/guides/GEMINI_MESSAGE_HANDLING.md

Google Gemini integration guide

Complete Google GenAI client message formatting requirements
Role conversion and content validation for Gemini compatibility
Error resolution for INVALID_ARGUMENT and empty message parts
Performance monitoring and provider-specific baselines
Integration with Enhanced Tracing v2.0 system
Troubleshooting guide and best practices

docs/guides/DOCKER_TTY_FIXES.md

Containerized deployment compatibility guide

Docker TTY issues and solutions for production deployment
Non-root user configuration and security best practices
Streamlit headless configuration for container environments
Weave tracing compatibility in containerized setups
Complete verification steps and troubleshooting

docs/guides/WEAVE_NATIVE_EVALUATION.md

Weave-native evaluation using official best practices

Proper Model and Dataset versioning with Weave
Built-in metric aggregation and comparison views
Custom scorer implementation patterns
Native UI integration for evaluation results
Migration guide from custom implementations

CLAUDE.md

AI assistant development log

Detailed record of changes and improvements made by the AI assistant
Implementation decisions and technical explanations
Feature development timeline and reasoning
Code modifications and their rationale

These documents provide in-depth technical guidance beyond the quick start instructions in this README, covering advanced topics like monitoring, containerization, and project management.

Data Sources & Citations

This project uses data from the Amazon Reviews 2023 dataset:

@article{hou2024bridging,
  title={Bridging Language and Items for Retrieval and Recommendation},
  author={Hou, Yupeng and Li, Jiacheng and He, Zhankui and Yan, An and Chen, Xiusi and McAuley, Julian},
  journal={arXiv preprint arXiv:2403.03952},
  year={2024}
}

Contributing

This is a capstone project for educational purposes. Feel free to explore, learn, and adapt the code for your own projects.

License

This project is licensed under the terms specified in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
.github/workflows		.github/workflows
data/processed		data/processed
docs		docs
examples		examples
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
FAQ.md		FAQ.md
LICENSE		LICENSE
Makefile		Makefile
PROJECT_CANVAS.md		PROJECT_CANVAS.md
README.md		README.md
docker-compose.postgres.yml		docker-compose.postgres.yml
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
test_agent.py		test_agent.py
test_agent_simple.py		test_agent_simple.py
uv.lock		uv.lock

License

HendrikReh/AI-Powered-Amazon-Product-Assistant

Folders and files

Latest commit

History

Repository files navigation

AI-Powered Amazon Product Assistant (B2C only)

Features

Out-of-Scope (B2B Scope)

Dataset Overview

Key Statistics

Setup & Installation

Prerequisites

Quick Start

Docker Deployment (with ChromaDB)

Enhanced Streamlit Interface

Enhanced Weave Tracing Setup

Vector Database Management

Ollama Local LLM Support

Project Structure

Data Processing Pipeline

Visualization Capabilities

Technical Stack

Usage Examples

Using Agent Mode

Data Processing

RAG System

Enhanced Weave-RAGAS Evaluation

Weave-Native Evaluation (Best Practices)

Synthetic Test Data Generation

Visualization

Troubleshooting

Quick Fixes

Recent Improvements

Enhanced Streamlit UI (v0.6.0)

Weave Tracing Optimization (v0.5.0)

Enhanced Weave-RAGAS Integration (v0.7.0)

LiteLLM Integration Fixes (v0.6.1)

LangGraph Agent Implementation (v0.9.0 - Sprint 3)

Documentation

docs/sprints/SPRINT_3.md & SPRINT_3_IMPLEMENTATION.md

Data Sources & Citations

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages