Update evaluation packages and evaluators #46072

gewarren · 2025-05-09T22:28:37Z

Fixes #45078 (remove Semantic Kernel eval tutorial)
Contributes to #46071 (update packages and evaluators)

Internal previews

📄 File	🔗 Preview link
docs/ai/conceptual/evaluation-libraries.md	The Microsoft.Extensions.AI.Evaluation libraries (Preview)
docs/ai/toc.yml	docs/ai/toc

BillWagner

This LGTM @gewarren

Let's

docs/ai/conceptual/evaluation-libraries.md

shyamnamboodiripad · 2025-05-12T18:32:32Z

docs/ai/conceptual/evaluation-libraries.md

+| Fluency      | Evaluates grammatical accuracy, vocabulary range, sentence complexity, and overall readability| <xref:Microsoft.Extensions.AI.Evaluation.Quality.FluencyEvaluator> |
+| Coherence    | Evaluates the logical and orderly presentation of ideas | <xref:Microsoft.Extensions.AI.Evaluation.Quality.CoherenceEvaluator> |
+| Equivalence  | Evaluates the similarity between the generated text and its ground truth with respect to a query | <xref:Microsoft.Extensions.AI.Evaluation.Quality.EquivalenceEvaluator> |
+| Groundedness | Evaluates how well a generated response aligns with the given context | <xref:Microsoft.Extensions.AI.Evaluation.Quality.GroundednessEvaluator><br />`GroundednessProEvaluator` |


Could you please separate this into 2 tables? The first one for Quality evaluators and the second for Safety evaluators.

It would be great to include a sentence about each set at the top of each table to clarify that -

Quality evaluators measure response quality for the following metrics and they use an LLM to perform the evaluation.

Safety evaluators check for presence of harmful / inappropriate / unsafe content in responses and they rely on the Azure AI Foundry Evaluation service (which uses a fine-tuned model behind the scenes).

@shyamnamboodiripad Yes, will do in a follow up PR. Thanks for reviewing!

shyamnamboodiripad · 2025-05-12T18:36:54Z

docs/ai/conceptual/evaluation-libraries.md

@@ -24,13 +25,25 @@ The libraries are designed to integrate smoothly with existing .NET apps, allowi

 The evaluation libraries were built in collaboration with data science researchers from Microsoft and GitHub, and were tested on popular Microsoft Copilot experiences. The following table shows the built-in evaluators.

-| Metric                             | Description                                  | Evaluator type |
-|------------------------------------|----------------------------------------------|----------------|
-| Relevance, truth, and completeness | How effectively a response addresses a query | <xref:Microsoft.Extensions.AI.Evaluation.Quality.RelevanceTruthAndCompletenessEvaluator> |


RTCEvaluator is still being shipped so I think we should continue to include it in the table - perhaps we can move it to the end of the table? Also, since it is now marked as [Experimental], it would be great to call this out somehow.

Note that the metrics returned from RTCEvaluator are named 'Relevance (RTC)', 'Truth (RTC)' and 'Completeness (RTC)' so that they won't conflict with the newly introduced dedicated 'Relevance' and 'Completeness' metrics.

gewarren added 3 commits May 9, 2025 14:52

remove llm-eval tutorial

2689d9a

remove llm-eval tutorial

3d6ed2a

add info about new packages and evaluators

b9fc335

gewarren requested a review from shyamnamboodiripad May 9, 2025 22:28

gewarren requested review from alexwolfmsft and a team as code owners May 9, 2025 22:28

dotnetrepoman bot added this to the May 2025 milestone May 9, 2025

alexwolfmsft approved these changes May 12, 2025

View reviewed changes

BillWagner approved these changes May 12, 2025

View reviewed changes

gewarren commented May 12, 2025

View reviewed changes

docs/ai/conceptual/evaluation-libraries.md Outdated Show resolved Hide resolved

Update docs/ai/conceptual/evaluation-libraries.md

a638eb9

gewarren merged commit 01409b0 into dotnet:main May 12, 2025
8 checks passed

gewarren deleted the eval-1 branch May 12, 2025 18:31

shyamnamboodiripad reviewed May 12, 2025

View reviewed changes

gewarren mentioned this pull request May 12, 2025

Break up evaluators table #46090

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update evaluation packages and evaluators #46072

Update evaluation packages and evaluators #46072

Uh oh!

gewarren commented May 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

BillWagner left a comment

Uh oh!

Uh oh!

Uh oh!

shyamnamboodiripad May 12, 2025

Uh oh!

gewarren May 12, 2025

Uh oh!

shyamnamboodiripad May 12, 2025

Uh oh!

Uh oh!

Update evaluation packages and evaluators #46072

Update evaluation packages and evaluators #46072

Uh oh!

Conversation

gewarren commented May 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Internal previews

Uh oh!

BillWagner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shyamnamboodiripad May 12, 2025

Choose a reason for hiding this comment

Uh oh!

gewarren May 12, 2025

Choose a reason for hiding this comment

Uh oh!

shyamnamboodiripad May 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gewarren commented May 9, 2025 •

edited by github-actions bot

Loading