From 68f760f520d725535248aca361e5fdec5531604b Mon Sep 17 00:00:00 2001 From: Genevieve Warren <24882762+gewarren@users.noreply.github.com> Date: Mon, 12 May 2025 12:32:47 -0700 Subject: [PATCH 1/6] break up evaluators table --- docs/ai/conceptual/evaluation-libraries.md | 25 ++++++++++++++---- docs/ai/get-started-app-chat-template.md | 30 +++++++++++----------- 2 files changed, 35 insertions(+), 20 deletions(-) diff --git a/docs/ai/conceptual/evaluation-libraries.md b/docs/ai/conceptual/evaluation-libraries.md index bf81c80ed2b8a..9af6219b0da0e 100644 --- a/docs/ai/conceptual/evaluation-libraries.md +++ b/docs/ai/conceptual/evaluation-libraries.md @@ -23,7 +23,13 @@ The libraries are designed to integrate smoothly with existing .NET apps, allowi ## Comprehensive evaluation metrics -The evaluation libraries were built in collaboration with data science researchers from Microsoft and GitHub, and were tested on popular Microsoft Copilot experiences. The following table shows the built-in evaluators. +The evaluation libraries were built in collaboration with data science researchers from Microsoft and GitHub, and were tested on popular Microsoft Copilot experiences. The following sections show the built-in [quality](#quality-evaluators) and [safety](#safety-evaluators) evaluators and the metrics they measure. + +You can also customize to add your own evaluations by implementing the interface or extending the base classes such as and . + +### Quality evaluators + +Quality evaluators measure response quality use an LLM to perform the evaluation. | Metric | Description | Evaluator type | |--------------|--------------------------------------------------------|----------------| @@ -33,8 +39,19 @@ The evaluation libraries were built in collaboration with data science researche | Fluency | Evaluates grammatical accuracy, vocabulary range, sentence complexity, and overall readability| | | Coherence | Evaluates the logical and orderly presentation of ideas | | | Equivalence | Evaluates the similarity between the generated text and its ground truth with respect to a query | | -| Groundedness | Evaluates how well a generated response aligns with the given context |
`GroundednessProEvaluator` | -| Protected material | Evaluates response for the presence of protected material | `ProtectedMaterialEvaluator` | +| Groundedness | Evaluates how well a generated response aligns with the given context | | +| Relevance (RTC), Truth (RTC), and Completeness (RTC) | Evaluates how relevant, truthful, and complete a response is | † | + +† This evaluator is marked [experimental](../../fundamentals/syslib-diagnostics/experimental-overview.md). + +### Safety evaluators + +Safety evaluators check for presence of harmful, inappropriate, or unsafe content in a response. They rely on the Azure AI Foundry Evaluation service, which uses a model that's fine tuned to perform evaluations. + +| Metric | Description | Evaluator type | +|--------------------|-----------------------------------------------------------------------|------------------------------| +| Groundedness | Evaluates how well a generated response aligns with the given context | `GroundednessProEvaluator` | +| Protected material | Evaluates response for the presence of protected material | `ProtectedMaterialEvaluator` | | Ungrounded human attributes | Evaluates a response for the presence of content that indicates ungrounded inference of human attributes | `UngroundedAttributesEvaluator` | | Hate content | Evaluates a response for the presence of content that's hateful or unfair | `HateAndUnfairnessEvaluator`† | | Self-harm content | Evaluates a response for the presence of content that indicates self harm | `SelfHarmEvaluator`† | @@ -45,8 +62,6 @@ The evaluation libraries were built in collaboration with data science researche † In addition, the `ContentHarmEvaluator` provides single-shot evaluation for the four metrics supported by `HateAndUnfairnessEvaluator`, `SelfHarmEvaluator`, `ViolenceEvaluator`, and `SexualEvaluator`. -You can also customize to add your own evaluations by implementing the interface or extending the base classes such as and . - ## Cached responses The library uses *response caching* functionality, which means responses from the AI model are persisted in a cache. In subsequent runs, if the request parameters (prompt and model) are unchanged, responses are then served from the cache to enable faster execution and lower cost. diff --git a/docs/ai/get-started-app-chat-template.md b/docs/ai/get-started-app-chat-template.md index 3a5b0007e2019..a87672fa7b5ce 100644 --- a/docs/ai/get-started-app-chat-template.md +++ b/docs/ai/get-started-app-chat-template.md @@ -15,9 +15,9 @@ This article shows you how to deploy and run the [Chat with your own data sample By following the instructions in this article, you will: -- Deploy a chat app to Azure. -- Get answers about employee benefits. -- Change settings to change behavior of responses. +* Deploy a chat app to Azure. +* Get answers about employee benefits. +* Change settings to change behavior of responses. Once you complete this procedure, you can start modifying the new project with your custom code. @@ -25,9 +25,9 @@ This article is part of a collection of articles that show you how to build a ch Other articles in the collection include: -- [Python](/azure/developer/python/get-started-app-chat-template) -- [JavaScript](/azure/developer/javascript/get-started-app-chat-template) -- [Java](/azure/developer/java/quickstarts/get-started-app-chat-template) +* [Python](/azure/developer/python/get-started-app-chat-template) +* [JavaScript](/azure/developer/javascript/get-started-app-chat-template) +* [Java](/azure/developer/java/quickstarts/get-started-app-chat-template) ## Architectural overview @@ -37,10 +37,10 @@ The architecture of the chat app is shown in the following diagram: :::image type="content" source="./media/get-started-app-chat-template/simple-architecture-diagram.png" lightbox="./media/get-started-app-chat-template/simple-architecture-diagram.png" alt-text="Diagram showing architecture from client to backend app."::: -- **User interface** - The application's chat interface is a [Blazor WebAssembly](/aspnet/core/blazor/) application. This interface is what accepts user queries, routes request to the application backend, and displays generated responses. -- **Backend** - The application backend is an [ASP.NET Core Minimal API](/aspnet/core/fundamentals/minimal-apis/overview). The backend hosts the Blazor static web application and is what orchestrates the interactions among the different services. Services used in this application include: - - [**Azure Cognitive Search**](/azure/search/search-what-is-azure-search) – Indexes documents from the data stored in an Azure Storage Account. This makes the documents searchable using [vector search](/azure/search/search-get-started-vector) capabilities. - - [**Azure OpenAI Service**](/azure/ai-services/openai/overview) – Provides the Large Language Models (LLM) to generate responses. [Semantic Kernel](/semantic-kernel/whatissk) is used in conjunction with the Azure OpenAI Service to orchestrate the more complex AI workflows. +* **User interface** - The application's chat interface is a [Blazor WebAssembly](/aspnet/core/blazor/) application. This interface is what accepts user queries, routes request to the application backend, and displays generated responses. +* **Backend** - The application backend is an [ASP.NET Core Minimal API](/aspnet/core/fundamentals/minimal-apis/overview). The backend hosts the Blazor static web application and is what orchestrates the interactions among the different services. Services used in this application include: + * [**Azure Cognitive Search**](/azure/search/search-what-is-azure-search) – Indexes documents from the data stored in an Azure Storage Account. This makes the documents searchable using [vector search](/azure/search/search-get-started-vector) capabilities. + * [**Azure OpenAI Service**](/azure/ai-services/openai/overview) – Provides the Large Language Models (LLM) to generate responses. [Semantic Kernel](/semantic-kernel/whatissk) is used in conjunction with the Azure OpenAI Service to orchestrate the more complex AI workflows. ## Cost @@ -283,8 +283,8 @@ If your issue isn't addressed, log your issue to the repository's [Issues](https ## Next steps -- [Get the source code for the sample used in this article](https://github.com/Azure-Samples/azure-search-openai-demo-csharp) -- [Build a chat app with Azure OpenAI](https://aka.ms/azai/chat) best practice solution architecture -- [Access control in Generative AI Apps with Azure AI Search](https://techcommunity.microsoft.com/t5/azure-ai-services-blog/access-control-in-generative-ai-applications-with-azure/ba-p/3956408) -- [Build an Enterprise ready OpenAI solution with Azure API Management](https://techcommunity.microsoft.com/t5/apps-on-azure-blog/build-an-enterprise-ready-azure-openai-solution-with-azure-api/bc-p/3935407) -- [Outperforming vector search with hybrid retrieval and ranking capabilities](https://techcommunity.microsoft.com/t5/azure-ai-services-blog/azure-cognitive-search-outperforming-vector-search-with-hybrid/ba-p/3929167) +* [Get the source code for the sample used in this article](https://github.com/Azure-Samples/azure-search-openai-demo-csharp) +* [Build a chat app with Azure OpenAI](https://aka.ms/azai/chat) best practice solution architecture +* [Access control in Generative AI Apps with Azure AI Search](https://techcommunity.microsoft.com/t5/azure-ai-services-blog/access-control-in-generative-ai-applications-with-azure/ba-p/3956408) +* [Build an Enterprise ready OpenAI solution with Azure API Management](https://techcommunity.microsoft.com/t5/apps-on-azure-blog/build-an-enterprise-ready-azure-openai-solution-with-azure-api/bc-p/3935407) +* [Outperforming vector search with hybrid retrieval and ranking capabilities](https://techcommunity.microsoft.com/t5/azure-ai-services-blog/azure-cognitive-search-outperforming-vector-search-with-hybrid/ba-p/3929167) From ebbc353f8695e274f97075ff15f551f227b6d866 Mon Sep 17 00:00:00 2001 From: Genevieve Warren <24882762+gewarren@users.noreply.github.com> Date: Mon, 12 May 2025 12:34:57 -0700 Subject: [PATCH 2/6] reset file --- docs/ai/get-started-app-chat-template.md | 30 ++++++++++++------------ 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/docs/ai/get-started-app-chat-template.md b/docs/ai/get-started-app-chat-template.md index a87672fa7b5ce..3a5b0007e2019 100644 --- a/docs/ai/get-started-app-chat-template.md +++ b/docs/ai/get-started-app-chat-template.md @@ -15,9 +15,9 @@ This article shows you how to deploy and run the [Chat with your own data sample By following the instructions in this article, you will: -* Deploy a chat app to Azure. -* Get answers about employee benefits. -* Change settings to change behavior of responses. +- Deploy a chat app to Azure. +- Get answers about employee benefits. +- Change settings to change behavior of responses. Once you complete this procedure, you can start modifying the new project with your custom code. @@ -25,9 +25,9 @@ This article is part of a collection of articles that show you how to build a ch Other articles in the collection include: -* [Python](/azure/developer/python/get-started-app-chat-template) -* [JavaScript](/azure/developer/javascript/get-started-app-chat-template) -* [Java](/azure/developer/java/quickstarts/get-started-app-chat-template) +- [Python](/azure/developer/python/get-started-app-chat-template) +- [JavaScript](/azure/developer/javascript/get-started-app-chat-template) +- [Java](/azure/developer/java/quickstarts/get-started-app-chat-template) ## Architectural overview @@ -37,10 +37,10 @@ The architecture of the chat app is shown in the following diagram: :::image type="content" source="./media/get-started-app-chat-template/simple-architecture-diagram.png" lightbox="./media/get-started-app-chat-template/simple-architecture-diagram.png" alt-text="Diagram showing architecture from client to backend app."::: -* **User interface** - The application's chat interface is a [Blazor WebAssembly](/aspnet/core/blazor/) application. This interface is what accepts user queries, routes request to the application backend, and displays generated responses. -* **Backend** - The application backend is an [ASP.NET Core Minimal API](/aspnet/core/fundamentals/minimal-apis/overview). The backend hosts the Blazor static web application and is what orchestrates the interactions among the different services. Services used in this application include: - * [**Azure Cognitive Search**](/azure/search/search-what-is-azure-search) – Indexes documents from the data stored in an Azure Storage Account. This makes the documents searchable using [vector search](/azure/search/search-get-started-vector) capabilities. - * [**Azure OpenAI Service**](/azure/ai-services/openai/overview) – Provides the Large Language Models (LLM) to generate responses. [Semantic Kernel](/semantic-kernel/whatissk) is used in conjunction with the Azure OpenAI Service to orchestrate the more complex AI workflows. +- **User interface** - The application's chat interface is a [Blazor WebAssembly](/aspnet/core/blazor/) application. This interface is what accepts user queries, routes request to the application backend, and displays generated responses. +- **Backend** - The application backend is an [ASP.NET Core Minimal API](/aspnet/core/fundamentals/minimal-apis/overview). The backend hosts the Blazor static web application and is what orchestrates the interactions among the different services. Services used in this application include: + - [**Azure Cognitive Search**](/azure/search/search-what-is-azure-search) – Indexes documents from the data stored in an Azure Storage Account. This makes the documents searchable using [vector search](/azure/search/search-get-started-vector) capabilities. + - [**Azure OpenAI Service**](/azure/ai-services/openai/overview) – Provides the Large Language Models (LLM) to generate responses. [Semantic Kernel](/semantic-kernel/whatissk) is used in conjunction with the Azure OpenAI Service to orchestrate the more complex AI workflows. ## Cost @@ -283,8 +283,8 @@ If your issue isn't addressed, log your issue to the repository's [Issues](https ## Next steps -* [Get the source code for the sample used in this article](https://github.com/Azure-Samples/azure-search-openai-demo-csharp) -* [Build a chat app with Azure OpenAI](https://aka.ms/azai/chat) best practice solution architecture -* [Access control in Generative AI Apps with Azure AI Search](https://techcommunity.microsoft.com/t5/azure-ai-services-blog/access-control-in-generative-ai-applications-with-azure/ba-p/3956408) -* [Build an Enterprise ready OpenAI solution with Azure API Management](https://techcommunity.microsoft.com/t5/apps-on-azure-blog/build-an-enterprise-ready-azure-openai-solution-with-azure-api/bc-p/3935407) -* [Outperforming vector search with hybrid retrieval and ranking capabilities](https://techcommunity.microsoft.com/t5/azure-ai-services-blog/azure-cognitive-search-outperforming-vector-search-with-hybrid/ba-p/3929167) +- [Get the source code for the sample used in this article](https://github.com/Azure-Samples/azure-search-openai-demo-csharp) +- [Build a chat app with Azure OpenAI](https://aka.ms/azai/chat) best practice solution architecture +- [Access control in Generative AI Apps with Azure AI Search](https://techcommunity.microsoft.com/t5/azure-ai-services-blog/access-control-in-generative-ai-applications-with-azure/ba-p/3956408) +- [Build an Enterprise ready OpenAI solution with Azure API Management](https://techcommunity.microsoft.com/t5/apps-on-azure-blog/build-an-enterprise-ready-azure-openai-solution-with-azure-api/bc-p/3935407) +- [Outperforming vector search with hybrid retrieval and ranking capabilities](https://techcommunity.microsoft.com/t5/azure-ai-services-blog/azure-cognitive-search-outperforming-vector-search-with-hybrid/ba-p/3929167) From fd52113cdd2f979edd6738a0a2c229d8894c3f8c Mon Sep 17 00:00:00 2001 From: Genevieve Warren <24882762+gewarren@users.noreply.github.com> Date: Mon, 12 May 2025 13:22:30 -0700 Subject: [PATCH 3/6] Apply suggestions from code review Co-authored-by: Shyam N --- docs/ai/conceptual/evaluation-libraries.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/ai/conceptual/evaluation-libraries.md b/docs/ai/conceptual/evaluation-libraries.md index 9af6219b0da0e..9b9e65afd14ff 100644 --- a/docs/ai/conceptual/evaluation-libraries.md +++ b/docs/ai/conceptual/evaluation-libraries.md @@ -29,7 +29,7 @@ You can also customize to add your own evaluations by implementing the Date: Mon, 12 May 2025 14:37:24 -0700 Subject: [PATCH 4/6] respond to feedback --- docs/ai/conceptual/evaluation-libraries.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/ai/conceptual/evaluation-libraries.md b/docs/ai/conceptual/evaluation-libraries.md index 9b9e65afd14ff..c7e78722b80c8 100644 --- a/docs/ai/conceptual/evaluation-libraries.md +++ b/docs/ai/conceptual/evaluation-libraries.md @@ -50,15 +50,15 @@ Safety evaluators check for presence of harmful, inappropriate, or unsafe conten | Metric | Description | Evaluator type | |--------------------|-----------------------------------------------------------------------|------------------------------| -| Groundedness Pro | Uses a fine-tuned model hosted behind the Azure AI Foundry Evaluation service to evaluate how well a generated response aligns with the given context | `GroundednessProEvaluator` | +| Groundedness Pro | Uses a fine-tuned model hosted behind the Azure AI Foundry Evaluation service to evaluate how well a generated response aligns with the given context | `GroundednessProEvaluator` | | Protected material | Evaluates response for the presence of protected material | `ProtectedMaterialEvaluator` | | Ungrounded attributes | Evaluates a response for the presence of content that indicates ungrounded inference of human attributes | `UngroundedAttributesEvaluator` | | Hate and unfairness | Evaluates a response for the presence of content that's hateful or unfair | `HateAndUnfairnessEvaluator`† | -| Self-harm content | Evaluates a response for the presence of content that indicates self harm | `SelfHarmEvaluator`† | -| Violent content | Evaluates a response for the presence of violent content | `ViolenceEvaluator`† | -| Sexual content | Evaluates a response for the presence of sexual content | `SexualEvaluator`† | -| Code vulnerability content | Evaluates a response for the presence of vulnerable code | `CodeVulnerabilityEvaluator` | -| Indirect attack content | Evaluates a response for the presence of indirect attacks, such as manipulated content, intrusion, and information gathering | `IndirectAttackEvaluator` | +| Self-harm | Evaluates a response for the presence of content that indicates self harm | `SelfHarmEvaluator`† | +| Violent | Evaluates a response for the presence of violent content | `ViolenceEvaluator`† | +| Sexual | Evaluates a response for the presence of sexual content | `SexualEvaluator`† | +| Code vulnerability | Evaluates a response for the presence of vulnerable code | `CodeVulnerabilityEvaluator` | +| Indirect attack | Evaluates a response for the presence of indirect attacks, such as manipulated content, intrusion, and information gathering | `IndirectAttackEvaluator` | † In addition, the `ContentHarmEvaluator` provides single-shot evaluation for the four metrics supported by `HateAndUnfairnessEvaluator`, `SelfHarmEvaluator`, `ViolenceEvaluator`, and `SexualEvaluator`. From 367707e1e916be888b3036f280f684423f1d0536 Mon Sep 17 00:00:00 2001 From: Genevieve Warren <24882762+gewarren@users.noreply.github.com> Date: Tue, 13 May 2025 09:46:17 -0700 Subject: [PATCH 5/6] use xrefs for new evaluators --- docs/ai/conceptual/evaluation-libraries.md | 30 +++++++++++----------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/docs/ai/conceptual/evaluation-libraries.md b/docs/ai/conceptual/evaluation-libraries.md index c7e78722b80c8..a06dec550454d 100644 --- a/docs/ai/conceptual/evaluation-libraries.md +++ b/docs/ai/conceptual/evaluation-libraries.md @@ -2,7 +2,7 @@ title: The Microsoft.Extensions.AI.Evaluation libraries description: Learn about the Microsoft.Extensions.AI.Evaluation libraries, which simplify the process of evaluating the quality and accuracy of responses generated by AI models in .NET intelligent apps. ms.topic: concept-article -ms.date: 05/09/2025 +ms.date: 05/13/2025 --- # The Microsoft.Extensions.AI.Evaluation libraries (Preview) @@ -33,9 +33,9 @@ Quality evaluators measure response quality. They use an LLM to perform the eval | Metric | Description | Evaluator type | |--------------|--------------------------------------------------------|----------------| -| Relevance | Evaluates how relevant a response is to a query | `RelevanceEvaluator` | -| Completeness | Evaluates how comprehensive and accurate a response is | `CompletenessEvaluator` | -| Retrieval | Evaluates performance in retrieving information for additional context | `RetrievalEvaluator` | +| Relevance | Evaluates how relevant a response is to a query | | +| Completeness | Evaluates how comprehensive and accurate a response is | | +| Retrieval | Evaluates performance in retrieving information for additional context | | | Fluency | Evaluates grammatical accuracy, vocabulary range, sentence complexity, and overall readability| | | Coherence | Evaluates the logical and orderly presentation of ideas | | | Equivalence | Evaluates the similarity between the generated text and its ground truth with respect to a query | | @@ -50,17 +50,17 @@ Safety evaluators check for presence of harmful, inappropriate, or unsafe conten | Metric | Description | Evaluator type | |--------------------|-----------------------------------------------------------------------|------------------------------| -| Groundedness Pro | Uses a fine-tuned model hosted behind the Azure AI Foundry Evaluation service to evaluate how well a generated response aligns with the given context | `GroundednessProEvaluator` | -| Protected material | Evaluates response for the presence of protected material | `ProtectedMaterialEvaluator` | -| Ungrounded attributes | Evaluates a response for the presence of content that indicates ungrounded inference of human attributes | `UngroundedAttributesEvaluator` | -| Hate and unfairness | Evaluates a response for the presence of content that's hateful or unfair | `HateAndUnfairnessEvaluator`† | -| Self-harm | Evaluates a response for the presence of content that indicates self harm | `SelfHarmEvaluator`† | -| Violent | Evaluates a response for the presence of violent content | `ViolenceEvaluator`† | -| Sexual | Evaluates a response for the presence of sexual content | `SexualEvaluator`† | -| Code vulnerability | Evaluates a response for the presence of vulnerable code | `CodeVulnerabilityEvaluator` | -| Indirect attack | Evaluates a response for the presence of indirect attacks, such as manipulated content, intrusion, and information gathering | `IndirectAttackEvaluator` | - -† In addition, the `ContentHarmEvaluator` provides single-shot evaluation for the four metrics supported by `HateAndUnfairnessEvaluator`, `SelfHarmEvaluator`, `ViolenceEvaluator`, and `SexualEvaluator`. +| Groundedness Pro | Uses a fine-tuned model hosted behind the Azure AI Foundry Evaluation service to evaluate how well a generated response aligns with the given context | | +| Protected material | Evaluates response for the presence of protected material | | +| Ungrounded attributes | Evaluates a response for the presence of content that indicates ungrounded inference of human attributes | | +| Hate and unfairness | Evaluates a response for the presence of content that's hateful or unfair | † | +| Self-harm | Evaluates a response for the presence of content that indicates self harm | † | +| Violent | Evaluates a response for the presence of violent content | † | +| Sexual | Evaluates a response for the presence of sexual content | † | +| Code vulnerability | Evaluates a response for the presence of vulnerable code | | +| Indirect attack | Evaluates a response for the presence of indirect attacks, such as manipulated content, intrusion, and information gathering | | + +† In addition, the provides single-shot evaluation for the four metrics supported by `HateAndUnfairnessEvaluator`, `SelfHarmEvaluator`, `ViolenceEvaluator`, and `SexualEvaluator`. ## Cached responses From 896577c06b5d37de27f937080f38802c3f2a8009 Mon Sep 17 00:00:00 2001 From: Genevieve Warren <24882762+gewarren@users.noreply.github.com> Date: Tue, 13 May 2025 10:20:44 -0700 Subject: [PATCH 6/6] code fence metric names --- docs/ai/conceptual/evaluation-libraries.md | 40 +++++++++++----------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/docs/ai/conceptual/evaluation-libraries.md b/docs/ai/conceptual/evaluation-libraries.md index a06dec550454d..bb5be136cb82e 100644 --- a/docs/ai/conceptual/evaluation-libraries.md +++ b/docs/ai/conceptual/evaluation-libraries.md @@ -25,22 +25,22 @@ The libraries are designed to integrate smoothly with existing .NET apps, allowi The evaluation libraries were built in collaboration with data science researchers from Microsoft and GitHub, and were tested on popular Microsoft Copilot experiences. The following sections show the built-in [quality](#quality-evaluators) and [safety](#safety-evaluators) evaluators and the metrics they measure. -You can also customize to add your own evaluations by implementing the interface or extending the base classes such as and . +You can also customize to add your own evaluations by implementing the interface. ### Quality evaluators Quality evaluators measure response quality. They use an LLM to perform the evaluation. -| Metric | Description | Evaluator type | -|--------------|--------------------------------------------------------|----------------| -| Relevance | Evaluates how relevant a response is to a query | | -| Completeness | Evaluates how comprehensive and accurate a response is | | -| Retrieval | Evaluates performance in retrieving information for additional context | | -| Fluency | Evaluates grammatical accuracy, vocabulary range, sentence complexity, and overall readability| | -| Coherence | Evaluates the logical and orderly presentation of ideas | | -| Equivalence | Evaluates the similarity between the generated text and its ground truth with respect to a query | | -| Groundedness | Evaluates how well a generated response aligns with the given context | | -| Relevance (RTC), Truth (RTC), and Completeness (RTC) | Evaluates how relevant, truthful, and complete a response is | † | +| Metric | Description | Evaluator type | +|----------------|--------------------------------------------------------|----------------| +| `Relevance` | Evaluates how relevant a response is to a query | | +| `Completeness` | Evaluates how comprehensive and accurate a response is | | +| `Retrieval` | Evaluates performance in retrieving information for additional context | | +| `Fluency` | Evaluates grammatical accuracy, vocabulary range, sentence complexity, and overall readability| | +| `Coherence` | Evaluates the logical and orderly presentation of ideas | | +| `Equivalence` | Evaluates the similarity between the generated text and its ground truth with respect to a query | | +| `Groundedness` | Evaluates how well a generated response aligns with the given context | | +| `Relevance (RTC)`, `Truth (RTC)`, and `Completeness (RTC)` | Evaluates how relevant, truthful, and complete a response is | † | † This evaluator is marked [experimental](../../fundamentals/syslib-diagnostics/experimental-overview.md). @@ -50,15 +50,15 @@ Safety evaluators check for presence of harmful, inappropriate, or unsafe conten | Metric | Description | Evaluator type | |--------------------|-----------------------------------------------------------------------|------------------------------| -| Groundedness Pro | Uses a fine-tuned model hosted behind the Azure AI Foundry Evaluation service to evaluate how well a generated response aligns with the given context | | -| Protected material | Evaluates response for the presence of protected material | | -| Ungrounded attributes | Evaluates a response for the presence of content that indicates ungrounded inference of human attributes | | -| Hate and unfairness | Evaluates a response for the presence of content that's hateful or unfair | † | -| Self-harm | Evaluates a response for the presence of content that indicates self harm | † | -| Violent | Evaluates a response for the presence of violent content | † | -| Sexual | Evaluates a response for the presence of sexual content | † | -| Code vulnerability | Evaluates a response for the presence of vulnerable code | | -| Indirect attack | Evaluates a response for the presence of indirect attacks, such as manipulated content, intrusion, and information gathering | | +| `Groundedness Pro` | Uses a fine-tuned model hosted behind the Azure AI Foundry Evaluation service to evaluate how well a generated response aligns with the given context | | +| `Protected Material` | Evaluates response for the presence of protected material | | +| `Ungrounded Attributes` | Evaluates a response for the presence of content that indicates ungrounded inference of human attributes | | +| `Hate And Unfairness` | Evaluates a response for the presence of content that's hateful or unfair | † | +| `Self Harm` | Evaluates a response for the presence of content that indicates self harm | † | +| `Violence` | Evaluates a response for the presence of violent content | † | +| `Sexual` | Evaluates a response for the presence of sexual content | † | +| `Code Vulnerability` | Evaluates a response for the presence of vulnerable code | | +| `Indirect Attack` | Evaluates a response for the presence of indirect attacks, such as manipulated content, intrusion, and information gathering | | † In addition, the provides single-shot evaluation for the four metrics supported by `HateAndUnfairnessEvaluator`, `SelfHarmEvaluator`, `ViolenceEvaluator`, and `SexualEvaluator`.