letta-ai
diff --git a/‎examples/letta-agent-rubric-grader/README.md‎
Lines changed: 83 additions & 11 deletions b/‎examples/letta-agent-rubric-grader/README.md‎
Lines changed: 83 additions & 11 deletions
diff --git a/‎examples/letta-agent-rubric-grader/rubric.txt‎ renamed to ‎examples/letta-agent-rubric-grader/custom_judge_rubric.txt‎ b/‎examples/letta-agent-rubric-grader/rubric.txt‎ renamed to ‎examples/letta-agent-rubric-grader/custom_judge_rubric.txt‎
diff --git a/‎examples/letta-agent-rubric-grader/suite.yaml‎ renamed to ‎examples/letta-agent-rubric-grader/custom_judge_suite.yaml‎
Lines changed: 3 additions & 3 deletions b/‎examples/letta-agent-rubric-grader/suite.yaml‎ renamed to ‎examples/letta-agent-rubric-grader/custom_judge_suite.yaml‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎examples/letta-agent-rubric-grader/judge.af‎ renamed to ‎examples/letta-agent-rubric-grader/custom_web_search_judge.af‎ b/‎examples/letta-agent-rubric-grader/judge.af‎ renamed to ‎examples/letta-agent-rubric-grader/custom_web_search_judge.af‎
diff --git a/‎examples/letta-agent-rubric-grader/default_judge_rubric.txt‎
Lines changed: 104 additions & 0 deletions b/‎examples/letta-agent-rubric-grader/default_judge_rubric.txt‎
Lines changed: 104 additions & 0 deletions
diff --git a/‎examples/letta-agent-rubric-grader/default_judge_suite.yaml‎
Lines changed: 16 additions & 0 deletions b/‎examples/letta-agent-rubric-grader/default_judge_suite.yaml‎
Lines changed: 16 additions & 0 deletions
@@ -69,36 +69,108 @@ Then run the evaluation as above.
 
 ### Suite Configuration
 
+This example provides two configuration approaches:
+
+#### Option 1: Default Letta Judge (Recommended for Most Use Cases)
+
+**File:** `default_judge_suite.yaml`
+
+The simplest configuration uses the built-in default judge agent with pre-fetched webpage content:
+
+```yaml
+name: fetch-webpage-default-judge-test
+dataset: dataset.csv
+target:
+  kind: agent
+  agent_file: test-fetch-webpage-simple-agent.af
+graders:
+  agent_judge:
+    kind: letta_judge                      # Use letta_judge kind
+    prompt_path: default_judge_rubric.txt  # Rubric with hardcoded webpage content
+    extractor: last_assistant
+gate:
+  metric_key: agent_judge
+  op: gte
+  value: 0.7
+```
+
+**How it works:**
+- Uses the default Letta judge agent (no custom agent_file needed)
+- Rubric includes the **hardcoded webpage content** for grading
+- Judge reads the content directly from the rubric prompt
+- Simpler, faster, and more reliable (no live web requests)
+
+**When to use:** Most evaluation scenarios where you know the expected content ahead of time.
+
+#### Option 2: Custom Letta Judge with Live Web Search (Advanced)
+
+**File:** `suite.yaml`
+
+For advanced scenarios where the judge needs to dynamically verify information:
+
 ```yaml
 name: fetch-webpage-agent-judge-test
-description: Test agent responses using a Letta agent as judge with rubric grading
 dataset: dataset.csv
 target:
   kind: agent
   agent_file: test-fetch-webpage-simple-agent.af
-  base_url: http://localhost:8283
 graders:
   agent_judge:
-    kind: rubric
-    agent_file: judge.af              # Judge agent with submit_grade tool
-    prompt_path: rubric.txt           # Rubric criteria for evaluation
-    judge_tool_name: submit_grade     # Tool the judge uses to submit scores
-    extractor: last_assistant         # Extract agent's final response
+    kind: letta_judge
+    agent_file: custom_web_search_judge.af  # Custom judge with web tools
+    prompt_path: custom_judge_rubric.txt    # Rubric instructs live fetching
+    judge_tool_name: submit_grade
+    extractor: last_assistant
 gate:
   metric_key: agent_judge
   op: gte
-  value: 0.75                         # Pass if avg score ≥ 0.75
+  value: 0.7
 ```
 
+**How it works:**
+- Uses a **custom judge agent** with `fetch_webpage` tool capabilities
+- Rubric instructs the judge to **fetch the webpage live** during grading
+- Judge performs real-time web requests to verify agent answers
+- More dynamic but slower and depends on network availability
+
+**When to use:** When evaluating against dynamic content, testing web-fetching capabilities, or when ground truth can't be pre-determined.
+
+**Key Differences:**
+
+| Aspect | Default Judge | Custom Judge |
+|--------|--------------|--------------|
+| **Agent** | Built-in default | Custom with web tools |
+| **Rubric** | Hardcoded content | Instructions to fetch live |
+| **Speed** | Faster (no web requests) | Slower (live fetching) |
+| **Reliability** | Higher (offline) | Lower (network dependent) |
+| **Use Case** | Static evaluation | Dynamic verification |
+| **Config Complexity** | Minimal (2 required fields) | Higher (4+ fields) |
+
 **Key Configuration Options:**
-- `agent_file`: Path to `.af` file containing the judge agent
+- `kind`: Must be `letta_judge` for agent-based judges
+- `agent_file`: (Optional) Path to custom `.af` judge agent. If omitted, uses default judge
 - `prompt_path`: Path to file containing rubric text (can also use `prompt` for inline rubric)
-- `judge_tool_name`: Name of the tool the judge calls to submit scores (default: `submit_grade`)
+- `judge_tool_name`: (Optional) Name of the tool the judge calls to submit scores. Only allowed with custom `agent_file`
 - `extractor`: How to extract the submission from agent trajectory
 
 ### Judge Agent Requirements & Gotchas
 
-#### ✅ Checklist: Will Your Judge Agent Work?
+**Recommendation: Use the Default Judge**
+
+We **highly recommend** using the default Letta judge (Option 1) for most use cases. Configuring a custom judge agent is complex and error-prone, with several potential footguns:
+- Tool schema must exactly match expected parameters
+- Tool name must be correctly specified
+- Agent must not have conflicting tools that confuse evaluation
+- Additional complexity in debugging when things go wrong
+
+**Only create a custom judge if you have a specific need**, such as:
+- Judge needs to fetch live web content for verification
+- Judge requires access to custom tools (databases, APIs, etc.)
+- Special evaluation logic that can't be expressed in the rubric alone
+
+If you do need a custom judge, use this checklist:
+
+#### Checklist: Will Your Judge Agent Work?
 
 Use this checklist to verify your judge agent is properly configured:
 
 
@@ -7,9 +7,9 @@ target:
   base_url: http://localhost:8283
 graders:
   agent_judge:
-    kind: rubric
-    agent_file: judge.af
-    prompt_path: rubric.txt
+    kind: letta_judge
+    agent_file: custom_web_search_judge.af
+    prompt_path: custom_judge_rubric.txt
     judge_tool_name: submit_grade
     extractor: last_assistant
 gate:
 
@@ -0,0 +1,104 @@
+Evaluate the agent's response based on the following criteria:
+
+1. **Correctness (0.6 weight)**: Does the response contain accurate information from the webpage? Check if the answer matches what was requested in the input.
+
+2. **Format (0.2 weight)**: Is the response formatted correctly? The input often requests answers in a specific format (e.g., in brackets like {Answer}).
+
+3. **Completeness (0.2 weight)**: Does the response fully address the question without unnecessary information?
+
+Scoring Guidelines:
+- 1.0: Perfect response - correct, properly formatted, and complete
+- 0.75-0.99: Good response - minor formatting or completeness issues
+- 0.5-0.74: Adequate response - correct information but format/completeness problems
+- 0.25-0.49: Poor response - partially correct or missing key information
+- 0.0-0.24: Failed response - incorrect or no relevant information
+
+Below is the content of the webpage that the agent fetched in order to answer the question. Please review this content to grade correctness:
+
+```
+Title: webpage1
+
+URL Source: https://www.york.ac.uk/teaching/cws/wws/webpage1.html
+
+Markdown Content:
+STARTING . . .
+--------------
+
+There are lots of ways to create web pages using already coded programmes. These lessons will teach you how to use the underlying HyperText Markup Language - HTML.
+
+HTML isn't computer code, but is a language that uses US English to enable texts (words, images, sounds) to be inserted and formatting such as colo(u)r and centre/ering to be written in. The process is fairly simple; the main difficulties often lie in small mistakes - if you slip up while word processing your reader may pick up your typos, but the page will still be legible. However, if your HTML is inaccurate the page may not appear - writing web pages is, at the least, very good practice for proof reading!
+
+Learning HTML will enable you to:
+
+*   create your own simple pages
+*   read and appreciate pages created by others
+*   develop an understanding of the creative and literary implications of web-texts
+*   have the confidence to branch out into more complex web design
+
+A HTML web page is made up of tags. Tags are placed in brackets like this **< tag >**. A tag tells the browser how to display information. Most tags need to be opened < tag > and closed < /tag >.
+
+To make a simple web page you need to know only four tags:
+
+*   < HTML > tells the browser your page is written in HTML format
+*   < HEAD > this is a kind of preface of vital information that doesn't appear on the screen.
+*   < TITLE >Write the title of the web page here - this is the information that viewers see on the upper bar of their screen. (I've given this page the title 'webpage1').
+*   < BODY >This is where you put the content of your page, the words and pictures that people read on the screen.
+
+All these tags need to be closed.
+
+#### EXERCISE
+
+Write a simple web page.
+
+Copy out exactly the HTML below, using a WP program such as Notepad.
+
+ Information in _italics_ indicates where you can insert your own text, other information is HTML and needs to be exact. However, make sure there are no spaces between the tag brackets and the text inside.
+
+ (Find Notepad by going to the START menu\ PROGRAMS\ ACCESSORIES\ NOTEPAD).
+
+< HTML >
+
+< HEAD >
+
+< TITLE >_title of page_< /TITLE >
+
+< /HEAD >
+
+< BODY>
+
+_write what you like here: 'my first web page', or a piece about what you are reading, or a few thoughts on the course, or copy out a few words from a book or cornflake packet. Just type in your words using no extras such as bold, or italics, as these have special HTML tags, although you may use upper and lower case letters and single spaces._
+
+< /BODY >
+
+< /HTML >
+
+Save the file as 'first.html' (ie. call the file anything at all) It's useful if you start a folder - just as you would for word-processing - and call it something like WEBPAGES, and put your first.html file in the folder.
+
+NOW - open your browser.
+
+ On Netscape the process is:
+
+ Top menu; FILE\ OPEN PAGE\ CHOOSE FILE
+
+ Click on your WEBPAGES folder\ FIRST file
+
+ Click 'open' and your page should appear.
+
+On Internet Explorer:
+
+ Top menu; FILE\ OPEN\ BROWSE
+
+ Click on your WEBPAGES folder\ FIRST file
+
+ Click 'open' and your page should appear.
+
+If the page doesn't open, go back over your notepad typing and make sure that all the HTML tags are correct. Check there are no spaces between tags and internal text; check that all tags are closed; check that you haven't written < HTLM > or < BDDY >. Your page will work eventually.
+
+Make another page. Call it somethingdifferent.html and place it in the same WEBPAGES folder as detailed above.
+
+start formatting in [lesson two](https://www.york.ac.uk/teaching/cws/wws/webpage2.html)
+
+[back to wws index](https://www.york.ac.uk/teaching/cws/wws/col3.html)
+```
+
+Use the submit_grade tool to submit your evaluation with a score between 0.0 and 1.0.
@@ -0,0 +1,16 @@
+name: fetch-webpage-default-judge-test
+description: Test agent responses using the default Letta judge with rubric grading
+dataset: dataset.csv
+target:
+  kind: agent
+  agent_file: test-fetch-webpage-simple-agent.af
+  base_url: http://localhost:8283
+graders:
+  agent_judge:
+    kind: letta_judge
+    prompt_path: default_judge_rubric.txt
+    extractor: last_assistant
+gate:
+  metric_key: agent_judge
+  op: gte
+  value: 0.7