This template allows you to build custom review apps on Databricks Apps for collecting labels from Subject Matter Experts. Customize how MLflow traces are displayedβshow specific spans, filter certain data, or present information differently than the standard Databricks review app. Create tailored labeling interfaces for your domain, collect targeted feedback for model improvement, and leverage integrated AI analysis for experiment optimization and result summarization.
Watch a complete walkthrough of building a custom MLflow Review App using Claude Code's /review-app
command:
πΉ Watch the Full Tutorial - See how to customize, deploy, and use the review app from start to finish
The setup script automatically handles all dependencies - just run it!
/review-app
Claude handles everything automatically:
- β Environment setup
- β Experiment analysis
- β Custom schema creation
- β Session configuration
- β Deployment
./setup.sh # Interactive setup wizard - handles all configuration
./watch.sh # Start dev servers (frontend:5173, backend:8000)
The renderer system lets you completely customize how subject matter experts view and evaluate traces. Renderers are stored as MLflow tags (mlflow.evaluation.renderer
) on labeling sessions and can be changed dynamically.
Step 1: Create Your Renderer Component
Create a new file in client/src/components/session-renderer/renderers/
:
// AnalyticsRenderer.tsx
import React from "react";
import { Button } from "@/components/ui/button";
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
import { ItemRendererProps } from "@/types/renderers";
import { LabelSchemaForm } from "./schemas/LabelSchemaForm";
import { BarChart3, AlertCircle, CheckCircle } from "lucide-react";
export function AnalyticsRenderer({
item,
traceData,
reviewApp,
session,
currentIndex,
totalItems,
assessments,
onUpdateItem,
onNavigateToIndex,
isSubmitting,
schemaAssessments,
}: ItemRendererProps) {
// Extract analytics-relevant data from trace spans
const spans = traceData?.spans || [];
const querySpans = spans.filter(s => s.name?.includes("query"));
const totalDuration = spans[0]?.end_time - spans[0]?.start_time || 0;
// Separate feedback and expectation schemas
const feedbackSchemas = schemaAssessments?.feedback || [];
const expectationSchemas = schemaAssessments?.expectations || [];
return (
<div className="grid grid-cols-1 lg:grid-cols-3 gap-6">
{/* Left: Query Analytics */}
<div className="lg:col-span-2 space-y-4">
<Card>
<CardHeader>
<CardTitle className="flex items-center gap-2">
<BarChart3 className="h-5 w-5" />
Query Performance
</CardTitle>
</CardHeader>
<CardContent>
<div className="space-y-4">
<div className="grid grid-cols-2 gap-4">
<div>
<p className="text-sm text-muted-foreground">Total Duration</p>
<p className="text-2xl font-bold">{totalDuration.toFixed(2)}ms</p>
</div>
<div>
<p className="text-sm text-muted-foreground">Query Count</p>
<p className="text-2xl font-bold">{querySpans.length}</p>
</div>
</div>
{/* Show individual query details */}
{querySpans.map((span, idx) => (
<div key={idx} className="border-l-2 border-blue-500 pl-4">
<p className="font-medium">{span.name}</p>
<pre className="text-xs text-muted-foreground mt-1">
{JSON.stringify(span.inputs, null, 2).slice(0, 200)}...
</pre>
</div>
))}
</div>
</CardContent>
</Card>
</div>
{/* Right: Evaluation Forms */}
<div className="space-y-6">
{feedbackSchemas.length > 0 && (
<div>
<h3 className="text-sm font-semibold mb-3">Performance Assessment</h3>
<LabelSchemaForm
schemas={feedbackSchemas.map(item => item.schema)}
assessments={new Map(
feedbackSchemas
.filter(item => item.assessment)
.map(item => [item.schema.name, item.assessment!])
)}
traceId={item.source?.trace_id || ""}
readOnly={false}
reviewAppId={reviewApp.review_app_id}
sessionId={session.labeling_session_id}
/>
</div>
)}
{/* Action buttons */}
<div className="flex gap-2">
<Button
variant="outline"
onClick={() => onUpdateItem(item.item_id, { state: "SKIPPED" })}
>
Skip
</Button>
<Button
onClick={() => onUpdateItem(item.item_id, { state: "COMPLETED" })}
disabled={isSubmitting || assessments.size === 0}
>
<CheckCircle className="h-4 w-4 mr-2" />
Submit
</Button>
</div>
</div>
</div>
);
}
Step 2: Register Your Renderer
Add to client/src/components/session-renderer/renderers/index.ts
:
import { AnalyticsRenderer } from "./AnalyticsRenderer";
// In the constructor, add:
this.registerRenderer({
name: "analytics", // Internal name (used in MLflow tag)
displayName: "Analytics View", // Display name in UI
description: "Performance metrics and query analysis view",
component: AnalyticsRenderer, // Your component
});
Step 3: Use Your Renderer
Go to the UI and choose from the dropdown to select your custom renderer.
Your renderer receives these props:
interface ItemRendererProps {
// Core Data
item: LabelingItem; // Current item (id, state, comment)
traceData: TraceData; // Full trace with spans
reviewApp: ReviewApp; // Review app with schemas
session: LabelingSession; // Current session details
// Navigation
currentIndex: number; // Current position (0-based)
totalItems: number; // Total items count
onNavigateToIndex: (i) => void; // Jump to item
// Assessment Management
assessments: Map<string, Assessment>; // Current values
onUpdateItem: async ( // Auto-save function
itemId: string,
updates: {
state?: "COMPLETED" | "SKIPPED";
assessments?: Map<string, Assessment>;
comment?: string;
}
) => void;
// UI State
isLoading?: boolean; // Data loading
isSubmitting?: boolean; // Save in progress
}
- Default Renderer: Full conversation view with detailed labeling forms
- Tool Renderer: Enhanced view showing tool execution details
./deploy.sh # Deploy to existing Databricks App
./deploy.sh --create # Create new app and deploy
./deploy.sh --verbose # Deploy with detailed logging
Always verify deployment success:
# Check app status
uv run python app_status.py --format json
# Monitor logs (replace with your app URL)
uv run python dba_logz.py https://your-app.databricksapps.com --search "INFO"
# Test health endpoint
uv run python dba_client.py https://your-app.databricksapps.com /health
The MLflow CLI provides 38 tools organized into categories for complete review app management:
Key Categories:
- LABELING - Create/manage schemas and sessions
- TRACES - Search and link MLflow traces
- ANALYTICS - AI-powered analysis and reporting
- MLFLOW - Experiment and permissions management
Essential Commands:
./mlflow-cli list # List all 38 tools
./mlflow-cli list labeling # List labeling tools
./mlflow-cli search <query> # Search for specific tools
./mlflow-cli help <tool> # Get detailed help
./mlflow-cli run <tool> [args] # Execute a tool
Common Operations:
# Create evaluation schemas
./mlflow-cli run create_labeling_schemas --preset quality-helpfulness
# Set up review session
./mlflow-cli run create_labeling_session --name "Review Session"
# Link traces to session
./mlflow-cli run link_traces_to_session --session-name "Review Session" --limit 100
# Analyze results
./mlflow-cli run ai_analyze_session --session-name "Review Session"
βββ server/ # FastAPI backend
β βββ routers/ # API endpoints
β βββ models/ # Data models
β βββ utils/ # Utilities
β
βββ client/ # React frontend
β βββ src/
β βββ pages/ # Main UI pages
β βββ components/ # Reusable components
β βββ session-renderer/ # Custom trace renderers
β
βββ tools/ # MLflow CLI tools
βββ scripts/ # Python build scripts
βββ setup.sh # Environment setup
βββ watch.sh # Start dev servers
βββ deploy.sh # Deploy to Databricks
βββ mlflow-cli # CLI wrapper
Issue | Solution |
---|---|
Dev server issues | pkill -f watch.sh && ./watch.sh |
Auth errors | Re-run ./setup.sh |
Missing TypeScript client | uv run python scripts/make_fastapi_client.py |
Deployment failures | Check logs with dba_logz.py |
API 404 errors | Verify router registration in server/app.py |
- Databricks Apps Documentation
- MLflow Documentation
- FastAPI Documentation
- React Documentation
- Claude Code - For AI-powered development
Ready to evaluate your AI agents? Start with ./setup.sh
or /review-app
in Claude Code.