Defining custom evaluators #1098

akshatshah-21 · 2025-06-02T13:54:52Z

akshatshah-21
Jun 2, 2025

Is there a good way to define custom evaluators for evals, and plug them in easily with the current evaluation framework?

Context: I need something like the TrajectoryEvaluator but the the way it compares tool calls needs to be flexible. Consider a generic search on internet tool as part of a query. To include this tool call in an eval, I would have to explicitly add the search query arg, and TrajectoryEvaluator._are_tool_calls_equal does a strict comparison between actual and expected tool calls:

adk-python/src/google/adk/evaluation/trajectory_evaluator.py

Lines 84 to 96 in 1773cda

    
           def _are_tool_calls_equal( 
        
               self, 
        
               actual_tool_calls: list[genai_types.FunctionCall], 
        
               expected_tool_calls: list[genai_types.FunctionCall], 
        
           ) -> bool: 
        
             if len(actual_tool_calls) != len(expected_tool_calls): 
        
               return False 
        
             for actual, expected in zip(actual_tool_calls, expected_tool_calls): 
        
               if actual.name != expected.name or actual.args != expected.args: 
        
                 return False 
        
             return True

But I can not pre-determine what a search query would exactly be, since it's a string and the LLM is free to rephrase or properly format it, or add hyphens and so on.

In some cases, I would perhaps like to skip checking on the tool args entirely or partially.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Defining custom evaluators #1098

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Defining custom evaluators #1098

Uh oh!

Uh oh!

akshatshah-21 Jun 2, 2025

Replies: 0 comments

akshatshah-21
Jun 2, 2025