Add Tool Call Accuracy Evaluator Bugbash Notebook #42121

salma-elshafey · 2025-07-21T12:37:59Z

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new swagger spec, a link to the pull request containing these swagger spec changes has been included above.

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

….com/salma-elshafey/azure-sdk-for-python into selshafey/improve_tool_call_accuracy

* Made response_format option type public and update docstr * resolved comment * update * Update CHANGELOG.md

@singankit

…d noisy messages (Azure#41852) * Initial plan * Fix lazy loading for optional dependency imports to avoid noisy messages Co-authored-by: singankit <30610298+singankit@users.noreply.github.com> * Refactor lazy import mechanism to use generalized function - Replace separate _try_import_aiagentconverter and _try_import_skagentconverter functions with a single _create_lazy_import function - The new function accepts class_name, module_path, and dependency_name as parameters - Reduces code duplication and makes the pattern more maintainable - Maintains exact same functionality and error messages - Updates corresponding unit tests to reflect the new implementation Addresses feedback from @singankit to generalize the lazy import pattern. Co-authored-by: singankit <30610298+singankit@users.noreply.github.com> * Remove [INFO] prefix from error messages in lazy imports Co-authored-by: singankit <30610298+singankit@users.noreply.github.com> * Simplify lazy import error handling to let ImportError propagate naturally Co-authored-by: singankit <30610298+singankit@users.noreply.github.com> * Implement lazy loading for red_team module to avoid noisy import messages Co-authored-by: singankit <30610298+singankit@users.noreply.github.com> * Revert red_team module to traditional import pattern as requested Co-authored-by: singankit <30610298+singankit@users.noreply.github.com> * Change red_team module to raise ImportError instead of print statement Co-authored-by: singankit <30610298+singankit@users.noreply.github.com> * Apply black formatting to fix code style issues Co-authored-by: singankit <30610298+singankit@users.noreply.github.com> * Remove AIAgentConverter test since azure-ai-projects is always in dev requirements Co-authored-by: singankit <30610298+singankit@users.noreply.github.com> * Skip lazy import tests when semantic-kernel is installed Co-authored-by: singankit <30610298+singankit@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: singankit <30610298+singankit@users.noreply.github.com>

…e#41822) Signed-off-by: Paul Van Eck <paulvaneck@microsoft.com>

… owner) (Azure#41496) * code and test * update-tasecase * Update CHANGELOG.md --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com>

… by SDK owner) (Azure#41859) * code and test * update changelog * Update CHANGELOG.md --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com>

* [Identity Broker] Allow default account usage in WSL Signed-off-by: Paul Van Eck <paulvaneck@microsoft.com> * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Signed-off-by: Paul Van Eck <paulvaneck@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

… merged by SDK owner) (Azure#41861) * code and test * update changelog * Update CHANGELOG.md --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com>

…arailSimulator, _SafetyEvaluation (Azure#41978) * Add retry logic for high concurrency scenarios * Update sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/simulator/_model_tools/_proxy_completion_model.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * black fixes --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…writes (Azure#42024) * Initial plan * Fix red team scan output_path issue - prevent interim evaluation overwrites Co-authored-by: slister1001 <103153180+slister1001@users.noreply.github.com> * Update CHANGELOG.md to document red team scan output_path bug fix Co-authored-by: slister1001 <103153180+slister1001@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: slister1001 <103153180+slister1001@users.noreply.github.com>

github-actions · 2025-07-21T12:38:37Z

Thank you for your contribution @salma-elshafey! We will review the pull request and get back to you soon.

Salma Elshafey and others added 30 commits June 22, 2025 22:40

support 5 levels, evaluate all tools at once

41de91a

Update sample notebook and change log

6a1e2b3

Add missing import

0dad199

Modify test cases to match the new output format

e4b1a37

Modify other test file to match the new output format

a40c91b

Fixed parsing of results

ed0ecf9

Change key name in output

9bc900b

Spell check fixes

eaf493a

Minor prompt update

1965639

Update result key to tool_call_accuracy

8865240

Delete test_new_evaluator.ipynb

fcd1cb8

Added field names and messages as constants

67fc87d

Merge branch 'selshafey/improve_tool_call_accuracy' of https://github…

080f941

….com/salma-elshafey/azure-sdk-for-python into selshafey/improve_tool_call_accuracy

Additional note in prompt

fd2429f

Re-add the temperature to the prompty file

6c9e342

Removed 'applicable' field and print statement

d0f637e

Move excess/missing tool calls fields under additional details

4c27dff

Typo fix and removal of redundant field in the prompt

3fa14f0

Modify per_tool_call_details field's name to details

2c3ce50

Made response_format option type public and update docstr (Azure#41991)

9d7aea0

* Made response_format option type public and update docstr * resolved comment * update * Update CHANGELOG.md

Increment package version after release of azure-cosmos (Azure#42006)

e646d9e

[Identity] Update VisualStudioCodeCredential to be broker-based (Azur…

79ccfa6

…e#41822) Signed-off-by: Paul Van Eck <paulvaneck@microsoft.com>

Use paramtype for keyword params (Azure#42008)

3b2723b

salma-elshafey requested review from msyyc, alexathomases, a team, dargilco, jhakulin, trangevi, glharper, nick863, howieleung, pvaneck and xiangyan99 as code owners July 21, 2025 12:38

salma-elshafey requested review from maorleger, christothes, KarishmaGhiya, chlowell and minhanh-phan July 21, 2025 12:38

github-project-automation bot added this to CosmosDB Python Eco-System and Azure Identity SDK Improvements Jul 21, 2025

github-project-automation bot moved this to Untriaged in Azure Identity SDK Improvements Jul 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Tool Call Accuracy Evaluator Bugbash Notebook #42121

Add Tool Call Accuracy Evaluator Bugbash Notebook #42121

salma-elshafey commented Jul 21, 2025

Uh oh!

github-actions bot commented Jul 21, 2025

Uh oh!

Uh oh!

Add Tool Call Accuracy Evaluator Bugbash Notebook #42121

Are you sure you want to change the base?

Add Tool Call Accuracy Evaluator Bugbash Notebook #42121

Conversation

salma-elshafey commented Jul 21, 2025

Description

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

Uh oh!

github-actions bot commented Jul 21, 2025

Uh oh!

Uh oh!