Skip to content

Commit eaf493a

Browse files
author
Salma Elshafey
committed
Spell check fixes
1 parent 9bc900b commit eaf493a

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_tool_call_accuracy/_tool_call_accuracy.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ class ToolCallAccuracyEvaluator(PromptyEvaluatorBase[Union[str, float]]):
2626
The evaluator uses a scoring rubric of 1 to 5:
2727
- Score 1: The tool calls are irrelevant
2828
- Score 2: The tool calls are partially relevant, but not enough tools were called or the parameters were not correctly passed
29-
- Score 3: The tool calls are relevant, but there were unncessary, excessive tool calls made
29+
- Score 3: The tool calls are relevant, but there were unnecessary, excessive tool calls made
3030
- Score 4: The tool calls are relevant, but some tools returned errors and agent retried calling them again and succeeded
3131
- Score 5: The tool calls are relevant, and all parameters were correctly passed
3232

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_tool_call_accuracy/tool_call_accuracy.prompty

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ user:
4343
# Ratings
4444
## [Tool Call Accuracy: 1] (Irrelevant)
4545
**Definition:**
46-
Tool calls were not relevant to the user's query, resulting in anirrelevant or unhelpful final output.
46+
Tool calls were not relevant to the user's query, resulting in an irrelevant or unhelpful final output.
4747
This level is a 'fail'.
4848

4949
**Example:**
@@ -122,7 +122,7 @@ TOOL DEFINITION: {{tool_definition}}
122122
Your output should consist only of a JSON object, as provided in the examples, that has the following keys:
123123
- chain_of_thought: a string that explains your thought process to decide on the tool call accuracy level. Start this string with 'Let's think step by step:', and think deeply and precisely about which level should be chosen based on the agent's tool calls and how they were able to address the user's query.
124124
- tool_calls_success_level: a integer value between 1 and 5 that represents the level of tool call success, based on the level definitions mentioned before. You need to be very precise when deciding on this level. Ensure you are correctly following the rating system based on the description of each level.
125-
- tool_calls_sucess_result: 'pass' or 'fail' based on the evaluation level of the tool call accuracy. Levels 1 and 2 are a 'fail', levels 3, 4 and 5 are a 'pass'.
125+
- tool_calls_success_result: 'pass' or 'fail' based on the evaluation level of the tool call accuracy. Levels 1 and 2 are a 'fail', levels 3, 4 and 5 are a 'pass'.
126126
- additional_details: a dictionary that contains the following keys:
127127
- tool_calls_made_by_agent: total number of tool calls made by the agent
128128
- correct_tool_calls_made_by_agent: total number of correct tool calls made by the agent

0 commit comments

Comments
 (0)