Fix red team scan output_path issue - prevent interim evaluation overwrites (#42024)

Copilot · slister1001 · web-flow · commit 56db2d798b8c · 2025-07-15T14:00:27.000-04:00
* Initial plan

* Fix red team scan output_path issue - prevent interim evaluation overwrites

Co-authored-by: slister1001 &lt;103153180+slister1001@users.noreply.github.com&gt;

* Update CHANGELOG.md to document red team scan output_path bug fix

Co-authored-by: slister1001 &lt;103153180+slister1001@users.noreply.github.com&gt;

---------

Co-authored-by: copilot-swe-agent[bot] &lt;198982749+Copilot@users.noreply.github.com&gt;
Co-authored-by: slister1001 &lt;103153180+slister1001@users.noreply.github.com&gt;
diff --git a/sdk/evaluation/azure-ai-evaluation/CHANGELOG.md b/sdk/evaluation/azure-ai-evaluation/CHANGELOG.md
@@ -9,6 +9,7 @@
 
 ### Bugs Fixed
 
+- Fixed red team scan `output_path` issue where individual evaluation results were overwriting each other instead of being preserved as separate files. Individual evaluations now create unique files while the user's `output_path` is reserved for final aggregated results.
 - Significant improvements to TaskAdherence evaluator. New version has less variance, is much faster and consumes fewer tokens.
 - Significant improvements to Relevance evaluator. New version has more concrete rubrics and has less variance, is much faster and consumes fewer tokens.
 
diff --git a/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_red_team.py b/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_red_team.py
@@ -2642,7 +2642,7 @@ async def _process_attack(
                     strategy=strategy,
                     _skip_evals=_skip_evals,
                     data_path=data_path,
-                    output_path=output_path,
+                    output_path=None,  # Fix: Do not pass output_path to individual evaluations
                 )
             except Exception as e:
                 log_error(self.logger, f"Error during evaluation for {strategy_name}/{risk_category.value}", e)

Original file line number	Diff line number	Diff line change
`@@ -2642,7 +2642,7 @@ async def _process_attack(`
`2642`	`2642`	`strategy=strategy,`
`2643`	`2643`	`_skip_evals=_skip_evals,`
`2644`	`2644`	`data_path=data_path,`
`2645`		`- output_path=output_path,`
	`2645`	`+ output_path=None, # Fix: Do not pass output_path to individual evaluations`
`2646`	`2646`	`)`
`2647`	`2647`	`except Exception as e:`
`2648`	`2648`	`log_error(self.logger, f"Error during evaluation for {strategy_name}/{risk_category.value}", e)`