Skip to content

Add Tool Call Accuracy Evaluator Bugbash Notebook #42121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 44 commits into
base: users/singankit/agent_evaluators_v2_bug_bash
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
41de91a
support 5 levels, evaluate all tools at once
Jun 22, 2025
6a1e2b3
Update sample notebook and change log
Jun 23, 2025
0dad199
Add missing import
Jun 23, 2025
e4b1a37
Modify test cases to match the new output format
Jun 23, 2025
a40c91b
Modify other test file to match the new output format
Jun 23, 2025
ed0ecf9
Fixed parsing of results
Jun 24, 2025
9bc900b
Change key name in output
Jun 24, 2025
eaf493a
Spell check fixes
Jun 24, 2025
1965639
Minor prompt update
Jun 24, 2025
8865240
Update result key to tool_call_accuracy
Jun 25, 2025
fcd1cb8
Delete test_new_evaluator.ipynb
salma-elshafey Jun 25, 2025
67fc87d
Added field names and messages as constants
Jun 25, 2025
080f941
Merge branch 'selshafey/improve_tool_call_accuracy' of https://github…
Jun 25, 2025
fd2429f
Additional note in prompt
Jun 29, 2025
6c9e342
Re-add the temperature to the prompty file
Jun 30, 2025
d0f637e
Removed 'applicable' field and print statement
Jun 30, 2025
4c27dff
Move excess/missing tool calls fields under additional details
Jul 1, 2025
3fa14f0
Typo fix and removal of redundant field in the prompt
Jul 2, 2025
2c3ce50
Modify per_tool_call_details field's name to details
Jul 7, 2025
9d7aea0
Made response_format option type public and update docstr (#41991)
howieleung Jul 14, 2025
e646d9e
Increment package version after release of azure-cosmos (#42006)
azure-sdk Jul 14, 2025
d76b8ba
[evaluation] Fix lazy loading for optional dependency imports to avoi…
Copilot Jul 14, 2025
79ccfa6
[Identity] Update VisualStudioCodeCredential to be broker-based (#41822)
pvaneck Jul 15, 2025
08b1156
[AutoRelease] t2-dashboard-2025-06-10-06558(can only be merged by SDK…
azure-sdk Jul 15, 2025
b926f21
[AutoRelease] t2-containerservice-2025-07-02-17859(can only be merged…
azure-sdk Jul 15, 2025
81df908
[Identity Broker] Allow default account usage in WSL (#42005)
pvaneck Jul 15, 2025
4268dd3
[AutoRelease] t2-hardwaresecuritymodules-2025-07-02-39085(can only be…
azure-sdk Jul 15, 2025
0c44507
[evaluation] add retry logic for high concurrency scenarios in Advers…
slister1001 Jul 15, 2025
56db2d7
Fix red team scan output_path issue - prevent interim evaluation over…
Copilot Jul 15, 2025
3b2723b
Use paramtype for keyword params (#42008)
howieleung Jul 15, 2025
7f1ed21
Updates Deid tests (#42029)
jovinson-ms Jul 15, 2025
ede1da3
[AutoRelease] t2-servicebus-2025-07-15-30222(can only be merged by SD…
azure-sdk Jul 16, 2025
3f6b6be
[AutoRelease] t2-eventhub-2025-07-15-35586(can only be merged by SDK …
azure-sdk Jul 16, 2025
9d3f4aa
code and test (#41686)
azure-sdk Jul 16, 2025
5265599
fix runs.create_and_process (#42040)
howieleung Jul 16, 2025
6525a6f
Revert "Modify per_tool_call_details field's name to details"
Jul 16, 2025
ff8ab23
[AI Agents] add mcp streaming sample (#42046)
glharper Jul 16, 2025
e72b084
Revert 'Merge branch 'main' into selshafey/improve_tool_call_accuracy'
Jul 16, 2025
3d4f2cc
Merge branch 'main' into selshafey/improve_tool_call_accuracy
Jul 16, 2025
a79b3a1
Black reformat
Jul 16, 2025
440b6c1
Reformat with black
Jul 16, 2025
e690217
To re-trigger build pipelines
Jul 17, 2025
35c4a55
Add notebook for bugbash
Jul 20, 2025
061c7bc
modify bugbash notebook
Jul 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
9 changes: 8 additions & 1 deletion sdk/ai/azure-ai-agents/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@

# Release History

## 1.1.0b5 (Unreleased)

### Bugs Fixed

- `AgentsResponseFormatOption`, `MessageInputContent`, `MessageAttachmentToolDefinition`, `AgentsToolChoiceOption` are now public.
- Fixed issues where the `runs.create_and_process` API call did not correctly handle the `AzureAISearchTool`, `FileSearchTool`, and `CodeInterpreterTool` when specified in the toolset parameter.

## 1.1.0b4 (2025-07-11)

### Features Added
Expand All @@ -12,7 +19,7 @@ the thread. Default value is None.

### Bugs Fixed

- `_AgentsClientOperationsMixin` now it is private.
- `_AgentsClientOperationsMixin` is now private.

### Sample updates

Expand Down
166 changes: 83 additions & 83 deletions sdk/ai/azure-ai-agents/azure/ai/agents/_patch.py

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion sdk/ai/azure-ai-agents/azure/ai/agents/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@
# Changes may cause incorrect behavior and will be lost if the code is regenerated.
# --------------------------------------------------------------------------

VERSION = "1.1.0b4"
VERSION = "1.1.0b5"
168 changes: 83 additions & 85 deletions sdk/ai/azure-ai-agents/azure/ai/agents/aio/_patch.py

Large diffs are not rendered by default.

44 changes: 28 additions & 16 deletions sdk/ai/azure-ai-agents/azure/ai/agents/aio/operations/_patch.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@

if TYPE_CHECKING:
# pylint: disable=unused-import,ungrouped-imports
from ... import _types
from ... import types as _types

JSON = MutableMapping[str, Any] # pylint: disable=unsubscriptable-object
_Unset: Any = object()
Expand Down Expand Up @@ -193,8 +193,10 @@ async def create( # pylint: disable=arguments-differ
:keyword response_format: Specifies the format that the model must output. Is one of the
following types: str, Union[str, "_models.AgentsApiResponseFormatMode"],
AgentsApiResponseFormat Default value is None.
:paramtype response_format: str or str or ~azure.ai.agents.models.AgentsApiResponseFormatMode
or ~azure.ai.agents.models.AgentsApiResponseFormat
:paramtype response_format: Optional[Union[str,
~azure.ai.agents.models.AgentsApiResponseFormatMode,
~azure.ai.agents.models.AgentsApiResponseFormat,
~azure.ai.agents.models.ResponseFormatJsonSchemaType]]
:keyword parallel_tool_calls: If ``true`` functions will run in parallel during tool use.
Default value is None.
:paramtype parallel_tool_calls: bool
Expand Down Expand Up @@ -356,8 +358,10 @@ async def create(
:keyword response_format: Specifies the format that the model must output. Is one of the
following types: str, Union[str, "_models.AgentsApiResponseFormatMode"],
AgentsApiResponseFormat Default value is None.
:paramtype response_format: str or str or ~azure.ai.agents.models.AgentsApiResponseFormatMode
or ~azure.ai.agents.models.AgentsApiResponseFormat
:paramtype response_format: Optional[Union[str,
~azure.ai.agents.models.AgentsApiResponseFormatMode,
~azure.ai.agents.models.AgentsApiResponseFormat,
~azure.ai.agents.models.ResponseFormatJsonSchemaType]]
:keyword parallel_tool_calls: If ``true`` functions will run in parallel during tool use.
Default value is None.
:paramtype parallel_tool_calls: bool
Expand Down Expand Up @@ -497,9 +501,10 @@ async def create_and_process(
:keyword response_format: Specifies the format that the model must output. Is one of the
following types: str, Union[str, "_models.AgentsApiResponseFormatMode"],
AgentsApiResponseFormat Default value is None.
:paramtype response_format: str or str or
~azure.ai.agents.models.AgentsApiResponseFormatMode or
~azure.ai.agents.models.AgentsApiResponseFormat
:paramtype response_format: Optional[Union[str,
~azure.ai.agents.models.AgentsApiResponseFormatMode,
~azure.ai.agents.models.AgentsApiResponseFormat,
~azure.ai.agents.models.ResponseFormatJsonSchemaType]]
:keyword parallel_tool_calls: If ``true`` functions will run in parallel during tool use.
Default value is None.
:paramtype parallel_tool_calls: bool
Expand All @@ -525,6 +530,7 @@ async def create_and_process(
additional_instructions=additional_instructions,
additional_messages=additional_messages,
tools=toolset.definitions if toolset else None,
tool_resources=toolset.resources if toolset else None,
temperature=temperature,
top_p=top_p,
max_prompt_tokens=max_prompt_tokens,
Expand Down Expand Up @@ -673,8 +679,10 @@ async def stream(
:keyword response_format: Specifies the format that the model must output. Is one of the
following types: str, Union[str, "_models.AgentsApiResponseFormatMode"],
AgentsApiResponseFormat Default value is None.
:paramtype response_format: str or str or ~azure.ai.agents.models.AgentsApiResponseFormatMode
or ~azure.ai.agents.models.AgentsApiResponseFormat
:paramtype response_format: Optional[Union[str,
~azure.ai.agents.models.AgentsApiResponseFormatMode,
~azure.ai.agents.models.AgentsApiResponseFormat,
~azure.ai.agents.models.ResponseFormatJsonSchemaType]]
:keyword parallel_tool_calls: If ``true`` functions will run in parallel during tool use.
Default value is None.
:paramtype parallel_tool_calls: bool
Expand Down Expand Up @@ -782,8 +790,10 @@ async def stream(
:keyword response_format: Specifies the format that the model must output. Is one of the
following types: str, Union[str, "_models.AgentsApiResponseFormatMode"],
AgentsApiResponseFormat Default value is None.
:paramtype response_format: str or str or ~azure.ai.agents.models.AgentsApiResponseFormatMode
or ~azure.ai.agents.models.AgentsApiResponseFormat
:paramtype response_format: Optional[Union[str,
~azure.ai.agents.models.AgentsApiResponseFormatMode,
~azure.ai.agents.models.AgentsApiResponseFormat,
~azure.ai.agents.models.ResponseFormatJsonSchemaType]]
:keyword parallel_tool_calls: If ``true`` functions will run in parallel during tool use.
Default value is None.
:paramtype parallel_tool_calls: bool
Expand Down Expand Up @@ -961,8 +971,10 @@ async def stream( # pyright: ignore[reportInconsistentOverload]
:keyword response_format: Specifies the format that the model must output. Is one of the
following types: str, Union[str, "_models.AgentsApiResponseFormatMode"],
AgentsApiResponseFormat Default value is None.
:paramtype response_format: str or str or ~azure.ai.agents.models.AgentsApiResponseFormatMode
or ~azure.ai.agents.models.AgentsApiResponseFormat
:paramtype response_format: Optional[Union[str,
~azure.ai.agents.models.AgentsApiResponseFormatMode,
~azure.ai.agents.models.AgentsApiResponseFormat,
~azure.ai.agents.models.ResponseFormatJsonSchemaType]]
:keyword parallel_tool_calls: If ``true`` functions will run in parallel during tool use.
Default value is None.
:paramtype parallel_tool_calls: bool
Expand Down Expand Up @@ -1339,7 +1351,7 @@ async def upload( # pylint: disable=arguments-differ
"""Uploads a file for use by other operations.

:keyword file_path: The path to the file to upload.
:type file_path: str
:paramtype file_path: str
:keyword purpose: The intended purpose of the uploaded file. Known values are: "assistants", "assistants_output", and "vision".
:paramtype purpose: str or ~azure.ai.agents.models.FilePurpose
:return: FileInfo. The FileInfo is compatible with MutableMapping
Expand Down Expand Up @@ -1500,7 +1512,7 @@ async def upload_and_poll(
"""Uploads a file for use by other operations.

:keyword file_path: The path to the file to upload.
:type file_path: str
:paramtype file_path: str
:keyword purpose: Known values are: "assistants", "assistants_output", and "vision".
:paramtype purpose: str or ~azure.ai.agents.models.FilePurpose
:keyword polling_interval: Time to wait before polling for the status of the uploaded file. Default value
Expand Down
2 changes: 1 addition & 1 deletion sdk/ai/azure-ai-agents/azure/ai/agents/models/_patch.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@
from ._models import ThreadMessage as ThreadMessageGenerated
from ._models import MessageAttachment as MessageAttachmentGenerated

from .. import _types
from .. import types as _types


logger = logging.getLogger(__name__)
Expand Down
44 changes: 28 additions & 16 deletions sdk/ai/azure-ai-agents/azure/ai/agents/operations/_patch.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@

if TYPE_CHECKING:
# pylint: disable=unused-import,ungrouped-imports
from .. import _types
from .. import types as _types

JSON = MutableMapping[str, Any] # pylint: disable=unsubscriptable-object
_Unset: Any = object()
Expand Down Expand Up @@ -196,8 +196,10 @@ def create( # pylint: disable=arguments-differ
:keyword response_format: Specifies the format that the model must output. Is one of the
following types: str, Union[str, "_models.AgentsApiResponseFormatMode"],
AgentsApiResponseFormat Default value is None.
:paramtype response_format: str or str or ~azure.ai.agents.models.AgentsApiResponseFormatMode
or ~azure.ai.agents.models.AgentsApiResponseFormat
:paramtype response_format: Optional[Union[str,
~azure.ai.agents.models.AgentsApiResponseFormatMode,
~azure.ai.agents.models.AgentsApiResponseFormat,
~azure.ai.agents.models.ResponseFormatJsonSchemaType]]
:keyword parallel_tool_calls: If ``true`` functions will run in parallel during tool use.
Default value is None.
:paramtype parallel_tool_calls: bool
Expand Down Expand Up @@ -363,8 +365,10 @@ def create(
:keyword response_format: Specifies the format that the model must output. Is one of the
following types: str, Union[str, "_models.AgentsApiResponseFormatMode"],
AgentsApiResponseFormat Default value is None.
:paramtype response_format: str or str or ~azure.ai.agents.models.AgentsApiResponseFormatMode
or ~azure.ai.agents.models.AgentsApiResponseFormat
:paramtype response_format: Optional[Union[str,
~azure.ai.agents.models.AgentsApiResponseFormatMode,
~azure.ai.agents.models.AgentsApiResponseFormat,
~azure.ai.agents.models.ResponseFormatJsonSchemaType]]
:keyword parallel_tool_calls: If ``true`` functions will run in parallel during tool use.
Default value is None.
:paramtype parallel_tool_calls: bool
Expand Down Expand Up @@ -505,9 +509,10 @@ def create_and_process(
:keyword response_format: Specifies the format that the model must output. Is one of the
following types: str, Union[str, "_models.AgentsApiResponseFormatMode"],
AgentsApiResponseFormat Default value is None.
:paramtype response_format: str or str or
~azure.ai.agents.models.AgentsApiResponseFormatMode or
~azure.ai.agents.models.AgentsApiResponseFormat
:paramtype response_format: Optional[Union[str,
~azure.ai.agents.models.AgentsApiResponseFormatMode,
~azure.ai.agents.models.AgentsApiResponseFormat,
~azure.ai.agents.models.ResponseFormatJsonSchemaType]]
:keyword parallel_tool_calls: If ``true`` functions will run in parallel during tool use.
Default value is None.
:paramtype parallel_tool_calls: bool
Expand All @@ -533,6 +538,7 @@ def create_and_process(
additional_instructions=additional_instructions,
additional_messages=additional_messages,
tools=toolset.definitions if toolset else None,
tool_resources=toolset.resources if toolset else None,
temperature=temperature,
top_p=top_p,
max_prompt_tokens=max_prompt_tokens,
Expand Down Expand Up @@ -685,8 +691,10 @@ def stream(
:keyword response_format: Specifies the format that the model must output. Is one of the
following types: str, Union[str, "_models.AgentsApiResponseFormatMode"],
AgentsApiResponseFormat Default value is None.
:paramtype response_format: str or str or ~azure.ai.agents.models.AgentsApiResponseFormatMode
or ~azure.ai.agents.models.AgentsApiResponseFormat
:paramtype response_format: Optional[Union[str,
~azure.ai.agents.models.AgentsApiResponseFormatMode,
~azure.ai.agents.models.AgentsApiResponseFormat,
~azure.ai.agents.models.ResponseFormatJsonSchemaType]]
:keyword parallel_tool_calls: If ``true`` functions will run in parallel during tool use.
Default value is None.
:paramtype parallel_tool_calls: bool
Expand Down Expand Up @@ -798,8 +806,10 @@ def stream(
:keyword response_format: Specifies the format that the model must output. Is one of the
following types: str, Union[str, "_models.AgentsApiResponseFormatMode"],
AgentsApiResponseFormat Default value is None.
:paramtype response_format: str or str or ~azure.ai.agents.models.AgentsApiResponseFormatMode
or ~azure.ai.agents.models.AgentsApiResponseFormat
:paramtype response_format: Optional[Union[str,
~azure.ai.agents.models.AgentsApiResponseFormatMode,
~azure.ai.agents.models.AgentsApiResponseFormat,
~azure.ai.agents.models.ResponseFormatJsonSchemaType]]
:keyword parallel_tool_calls: If ``true`` functions will run in parallel during tool use.
Default value is None.
:paramtype parallel_tool_calls: bool
Expand Down Expand Up @@ -982,8 +992,10 @@ def stream( # pyright: ignore[reportInconsistentOverload]
:keyword response_format: Specifies the format that the model must output. Is one of the
following types: str, Union[str, "_models.AgentsApiResponseFormatMode"],
AgentsApiResponseFormat Default value is None.
:paramtype response_format: str or str or ~azure.ai.agents.models.AgentsApiResponseFormatMode
or ~azure.ai.agents.models.AgentsApiResponseFormat
:paramtype response_format: Optional[Union[str,
~azure.ai.agents.models.AgentsApiResponseFormatMode,
~azure.ai.agents.models.AgentsApiResponseFormat,
~azure.ai.agents.models.ResponseFormatJsonSchemaType]]
:keyword parallel_tool_calls: If ``true`` functions will run in parallel during tool use.
Default value is None.
:paramtype parallel_tool_calls: bool
Expand Down Expand Up @@ -1372,7 +1384,7 @@ def upload( # pylint: disable=arguments-differ
"""Uploads a file for use by other operations.

:keyword file_path: The path to the file to upload.
:type file_path: str
:paramtype file_path: str
:keyword purpose: Known values are: "assistants", "assistants_output", and "vision".
:paramtype purpose: str or ~azure.ai.agents.models.FilePurpose
:return: FileInfo. The FileInfo is compatible with MutableMapping
Expand Down Expand Up @@ -1533,7 +1545,7 @@ def upload_and_poll(
"""Uploads a file for use by other operations.

:keyword file_path: The path to the file on the local filesystem to upload.
:type file_path: str
:paramtype file_path: str
:keyword purpose: Known values are: "assistants", "assistants_output", and "vision".
:paramtype purpose: str or ~azure.ai.agents.models.FilePurpose
:keyword polling_interval: Time to wait before polling for the status of the uploaded file. Default value
Expand Down
19 changes: 19 additions & 0 deletions sdk/ai/azure-ai-agents/azure/ai/agents/types/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# pylint: disable=line-too-long,useless-suppression,too-many-lines
# ------------------------------------
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
# ------------------------------------

from azure.ai.agents._types import (
AgentsResponseFormatOption,
MessageInputContent,
MessageAttachmentToolDefinition,
AgentsToolChoiceOption,
)

__all__ = [
"AgentsResponseFormatOption",
"MessageInputContent",
"MessageAttachmentToolDefinition",
"AgentsToolChoiceOption",
]
Loading