opensearch-project
diff --git a/‎.github/vale/styles/OpenSearch/SubstitutionsError.yml
Lines changed: 1 addition & 0 deletions b/‎.github/vale/styles/OpenSearch/SubstitutionsError.yml
Lines changed: 1 addition & 0 deletions
diff --git a/‎.github/vale/styles/Vocab/OpenSearch/Products/accept.txt
Lines changed: 1 addition & 0 deletions b/‎.github/vale/styles/Vocab/OpenSearch/Products/accept.txt
Lines changed: 1 addition & 0 deletions
diff --git a/‎.github/vale/styles/Vocab/OpenSearch/Words/accept.txt
Lines changed: 2 additions & 0 deletions b/‎.github/vale/styles/Vocab/OpenSearch/Words/accept.txt
Lines changed: 2 additions & 0 deletions
diff --git a/‎_search-plugins/index.md
Lines changed: 1 addition & 1 deletion b/‎_search-plugins/index.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎_search-plugins/search-relevance/compare-query-sets.md
Lines changed: 254 additions & 0 deletions b/‎_search-plugins/search-relevance/compare-query-sets.md
Lines changed: 254 additions & 0 deletions
diff --git a/‎_search-plugins/search-relevance/compare-search-results.md
Lines changed: 32 additions & 12 deletions b/‎_search-plugins/search-relevance/compare-search-results.md
Lines changed: 32 additions & 12 deletions
@@ -54,3 +54,4 @@ swap:
   'web site': website
   'whitespace': white space
   'user interface \(UI\)': UI
+  'judgement': judgment
@@ -96,6 +96,7 @@ RankLib
 RCF Summarize
 RPM Package Manager
 Ruby
+Search Relevance Workbench
 Simple Schema for Observability
 Tableau
 Textract
 
@@ -54,6 +54,7 @@ gibibyte
 [Ii]nstrumentations?
 [Ii]ntracluster
 [Jj]avadoc
+[Jj]accard
 k-NN
 [Kk]eystore
 kibibyte
@@ -85,6 +86,7 @@ p\d{2}
 [Pp]erformant
 [Pp]laintext
 [Pp]luggable
+[Pp]ointwise
 [Pp]reaggregate(s|d)?
 [Pp]recompute(s|d)?
 [Pp]reconfigure
 
@@ -74,7 +74,7 @@ OpenSearch offers several ways to improve search performance:
 
 OpenSearch provides the following search relevance features:
 
-- [Compare Search Results]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/compare-search-results/): A search comparison tool in OpenSearch Dashboards that you can use to compare results from two queries side by side. 
+- [Search Relevance Workbench]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/using-search-relevance-workbench/): A suite of tools that support search quality improvements through experimentation. 
 
 - [Querqy]({{site.url}}{{site.baseurl}}/search-plugins/querqy/): Offers query rewriting capability.
 
 
@@ -0,0 +1,254 @@
+---
+layout: default
+title: Comparing query sets
+nav_order: 12
+parent: Using Search Relevance Workbench
+grand_parent: Search relevance
+has_children: false
+has_toc: false
+---
+
+# Comparing query sets
+
+To compare the results of two different search configurations, you can run a pairwise experiment. To achieve this, you need two search configurations and a query set to use for the search configuration.
+
+
+For more information about creating a query set, see [Query Sets]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/query-sets/).
+
+For more information about creating search configurations, see [Search Configurations]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/search-configurations/).
+
+## Creating a pairwise experiment
+
+An experiment is used to compare the metrics between two different search configurations. An experiment shows you the top N results for every query based on the specified search configurations. In the dashboard, you can view the returned documents from any of the queries in the query set and determine which search configuration returns more relevant results. Additionally, you can measure the similarity between the two returned search result lists using the provided similarity metrics.
+
+### Example
+
+To create a pairwise comparison experiment for the specified query set and search configurations, send the following request:
+
+```json
+PUT _plugins/_search_relevance/experiments
+{
+    "querySetId": "8368a359-146b-4690-b756-40591b2fcddb",
+   	"searchConfigurationList": ["a5acc9f3-6ad7-43f4-9651-fe118c499bc6", "26c7255c-c36e-42fb-b5b2-633dbf8e53b6"],
+   	"size": 10,
+   	"type": "PAIRWISE_COMPARISON"
+}
+```
+{% include copy-curl.html %}
+
+### Request body fields
+
+The following table lists the available input parameters.
+
+Field | Data type |  Description
+:---  | :--- | :---
+`querySetId` | String |	The query set ID.
+`searchConfigurationList` | List | A list of search configuration IDs to use for comparison.
+`size` | Integer | The number of documents to return in the results.
+`type` | String | Defines the type of experiment to run. Valid values are `PAIRWISE_COMPARISON`, `HYBRID_OPTIMIZER`, or `POINTWISE_EVALUATION`. Depending on the experiment type, you must provide different body fields in the request. `PAIRWISE_COMPARISON` is for comparing two search configurations against a query set and is used [here]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/compare-query-sets/). `HYBRID_OPTIMIZER` is for combining results and is used [here]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/optimize-hybrid-search/). `POINTWISE_EVALUATION` is for evaluating a search configuration against judgments and is used [here]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/evaluate-search-quality/). 
+
+The response contains the experiment ID of the created experiment:
+
+```json
+{
+    "experiment_id": "cbd2c209-96d1-4012-aa73-e524b7a1b11a",
+    "experiment_result": "CREATED"
+}
+```
+## Interpreting the experiment results
+To interpret the experiment results, use the following operations.
+
+### Retrieving the experiment results
+
+Use the following API to retrieve the result of a specific experiment.
+
+#### Endpoints
+
+```json
+GET _plugins/_search_relevance/experiments
+GET _plugins/_search_relevance/experiments/<experiment_id>
+```
+
+#### Path parameters
+
+The following table lists the available path parameters.
+
+| Parameter | Data type | Description |
+| :--- | :--- | :--- |
+| `experiment_id` | String | The ID of the experiment to retrieve. Retrieves all experiments when empty. |
+
+#### Example request
+
+```json
+GET _plugins/_search_relevance/experiments/cbd2c209-96d1-4012-aa73-e524b7a1b11a
+```
+
+#### Example response
+
+```json
+{
+  "took": 2,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 1,
+      "relation": "eq"
+    },
+    "max_score": 1,
+    "hits": [
+      {
+        "_index": ".plugins-search-relevance-experiment",
+        "_id": "cbd2c209-96d1-4012-aa73-e524b7a1b11a",
+        "_score": 1,
+        "_source": {
+          "id": "cbd2c209-96d1-4012-aa73-e524b7a1b11a",
+          "timestamp": "2025-06-11T23:24:26.792Z",
+          "type": "PAIRWISE_COMPARISON",
+          "status": "PROCESSING",
+          "querySetId": "8368a359-146b-4690-b756-40591b2fcddb",
+          "searchConfigurationList": [
+            "a5acc9f3-6ad7-43f4-9651-fe118c499bc6",
+            "26c7255c-c36e-42fb-b5b2-633dbf8e53b6"
+          ],
+          "judgmentList": [],
+          "size": 10,
+          "results": {}
+        }
+      }
+    ]
+  }
+}
+```
+
+Once the experiment finishes running, the results are available:
+
+<details open markdown="block">
+  <summary>
+    Response
+  </summary>
+
+```json
+{
+    "took": 34,
+    "timed_out": false,
+    "_shards": {
+        "total": 1,
+        "successful": 1,
+        "skipped": 0,
+        "failed": 0
+    },
+    "hits": {
+        "total": {
+            "value": 1,
+            "relation": "eq"
+        },
+        "max_score": 1.0,
+        "hits": [
+            {
+                "_index": ".plugins-search-relevance-experiment",
+                "_id": "cbd2c209-96d1-4012-aa73-e524b7a1b11a",
+                "_score": 1.0,
+                "_source": {
+                    "id": "cbd2c209-96d1-4012-aa73-e524b7a1b11a",
+                    "timestamp": "2025-06-12T04:18:37.284Z",
+                    "type": "PAIRWISE_COMPARISON",
+                    "status": "COMPLETED",
+                    "querySetId": "7889ffe9-835e-4f48-a9cd-53905bb967d3",
+                    "searchConfigurationList": [
+                        "a5acc9f3-6ad7-43f4-9651-fe118c499bc6",
+                        "26c7255c-c36e-42fb-b5b2-633dbf8e53b6"
+                    ],
+                    "judgmentList": [],
+                    "size": 10,
+                    "results": {
+                        "tv": {
+                            "26c7255c-c36e-42fb-b5b2-633dbf8e53b6": [
+                                "B07X3S9RTZ",
+                                "B07WVZFKLQ",
+                                "B00GXD4NWE",
+                                "B07ZKCV5K5",
+                                "B07ZKDVHFB",
+                                "B086VKT9R8",
+                                "B08XLM8YK1",
+                                "B07FPP6TB5",
+                                "B07N1TMNHB",
+                                "B09CDHM8W7"
+                            ],
+                            "pairwiseComparison": {
+                                "jaccard": 0.11,
+                                "rbo90": 0.16,
+                                "frequencyWeighted": 0.2,
+                                "rbo50": 0.07
+                            },
+                            "a5acc9f3-6ad7-43f4-9651-fe118c499bc6": [
+                                "B07Q7VGW4Q",
+                                "B00GXD4NWE",
+                                "B07VML1CY1",
+                                "B07THVCJK3",
+                                "B07RKSV7SW",
+                                "B010EAW8UK",
+                                "B07FPP6TB5",
+                                "B073G9ZD33",
+                                "B07VXRXRJX",
+                                "B07Q45SP9P"
+                            ]
+                        },
+                        "led tv": {
+                            "26c7255c-c36e-42fb-b5b2-633dbf8e53b6": [
+                                "B01M1D0KL1",
+                                "B07YSMD3Z9",
+                                "B07V4CY9GZ",
+                                "B074KFP426",
+                                "B07S8XNWWF",
+                                "B07XBJR7GY",
+                                "B075FDWSHT",
+                                "B01N2Z17MS",
+                                "B07F1T4JFB",
+                                "B07S658ZLH"
+                            ],
+                            "pairwiseComparison": {
+                                "jaccard": 0.11,
+                                "rbo90": 0.13,
+                                "frequencyWeighted": 0.2,
+                                "rbo50": 0.03
+                            },
+                            "a5acc9f3-6ad7-43f4-9651-fe118c499bc6": [
+                                "B07Q45SP9P",
+                                "B074KFP426",
+                                "B07JKVKZX8",
+                                "B07THVCJK3",
+                                "B0874XJYW8",
+                                "B08LVPWQQP",
+                                "B07V4CY9GZ",
+                                "B07X3BS3DF",
+                                "B074PDYLCZ",
+                                "B08CD9MKLZ"
+                            ]
+                        }
+                    }
+                }
+            }
+        ]
+    }
+}
+```
+
+</details>
+
+### Interpreting the results
+
+As shown in the preceding response, both search configurations return the top N documents, with `size` set to 10 in the search request. In addition to the results, the response also includes metrics from the pairwise comparison.
+
+### Response body fields
+
+Field | Description
+:--- | :---
+`jaccard` | Shows the similarity score by dividing the intersection cardinality by the union cardinality of the returned documents.
+`rbo` | The Rank-Biased Overlap (RBO) metric compares the returned result sets at each ranking depth—for example, the top 1 document, top 2 documents, and so on. It places greater importance on higher-ranked results, giving more weight to earlier positions in the list.
+`frequencyWeighted` | Similar to the Jaccard metric, the frequency-weighted metric calculates the ratio of the weighted intersection to the weighted union of two sets. However, unlike standard Jaccard, it gives more weight to documents with higher frequencies, skewing the result toward more frequently occurring items.
@@ -1,17 +1,16 @@
 ---
 layout: default
-title: Comparing search results
-nav_order: 55
-parent: Search relevance
+title: Comparing single queries
+nav_order: 11
+parent: Using Search Relevance Workbench
+grand_parent: Search relevance
 has_children: false
 has_toc: false
-redirect_from:
-  - /search-plugins/search-relevance/
 ---
 
-# Comparing search results
+# Comparing single queries
 
-With Compare Search Results in OpenSearch Dashboards, you can compare results from two queries side by side to determine whether one query produces better results than the other. Using this tool, you can evaluate search quality by experimenting with queries. 
+With Compare Search Results in OpenSearch Dashboards, you can compare results from two queries side by side to determine whether one query produces better results than the other. Using this tool, you can evaluate search quality by experimenting with queries.
 
 For example, you can see how results change when you apply one of the following query changes:
 
@@ -21,20 +20,20 @@ For example, you can see how results change when you apply one of the following
 
 ## Prerequisites
 
-Before you get started, you must index data in OpenSearch. To learn how to create a new index, see [Index data]({{site.url}}{{site.baseurl}}/opensearch/index-data/). 
+Before you get started, you must index data in OpenSearch. To learn how to create a new index, see [Index data]({{site.url}}{{site.baseurl}}/opensearch/index-data/).
 
 Alternatively, you can add sample data in OpenSearch Dashboards using the following steps:
 
 1. On the top menu bar, go to **OpenSearch Dashboards > Overview**.
 1. Select **View app directory**.
-1. Select **Add sample data**.  
+1. Select **Add sample data**.
 1. Choose one of the built-in datasets and select **Add data**.
 
 ## Using Compare Search Results in OpenSearch Dashboards
 
 To compare search results in OpenSearch Dashboards, perform the following steps.
 
-**Step 1:** On the top menu bar, go to **OpenSearch Plugins > Search Relevance**.  
+**Step 1:** On the top menu bar, go to **OpenSearch Plugins > Search Relevance**.
 
 **Step 2:** Enter the search text in the search bar.
 
@@ -74,7 +73,7 @@ The following example screen shows a search for the word "cup" in the `descripti
 
 <img src="{{site.url}}{{site.baseurl}}/images/search_relevance.png" alt="Compare search results"/>{: .img-fluid }
 
-If a result in Result 1 appears in Result 2, the `Up` and `Down` indicators below the result number signify how many places the result moved up or down compared to the same result in Result 2. In this example, the document with the ID 2 is `Up 1` place in Result 2 compared to Result 1 and `Down 1` place in Result 1 compared to Result 2. 
+If a result in Result 1 appears in Result 2, the `Up` and `Down` indicators below the result number signify how many positions the result moved up or down compared to the same result in Result 2. In this example, the document with the ID 2 is `Up 1` position in Result 2 compared to Result 1 and `Down 1` position in Result 1 compared to Result 2.
 
 ## Changing the number of results
 
@@ -98,6 +97,27 @@ Setting `size` to a high value (for example, larger than 250 documents) may degr
 You cannot save a given comparison for future use, so Compare Search Results is not suitable for systematic testing.
 {: .note}
 
+## Comparing OpenSearch search results using Search Relevance Workbench
+
+[Search Relevance Workbench]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/using-search-relevance-workbench/) provides richer visualization options for examining the difference between two queries.
+
+To use Search Relevance Workbench, follow steps 1--4. The displayed results and the options for viewing the differences are shown in the following image.
+
+<img src="{{site.url}}{{site.baseurl}}/images/search-relevance-workbench/comparing_search_results.png" alt="Compare search results"/>{: .img-fluid }
+
+The top section provides a summary of the results: how many of the retrieved results are unique to the query on the left, how many are unique to the query on the right, and how many are part of both queries?
+
+What follows is a visual representation of the retrieved results. By default the unique identifier field (`_id`) is shown. You can change this by selecting a different field in the **Display Field** dropdown list.
+In the side-by-side view, you can see the positional changes for all common documents among the two result lists.
+Selecting one item shows all stored fields in the index to facilitate easier document identification.
+
+Lastly, Search Relevance Workbench allows you to choose among different visualization styles from a dropdown list:
+
+* **Default style**: Different colors are used for the two result list documents (unique results are on the left in yellow and on the right in purple, and common results are displayed in green).
+* **Ranking change color coding**: All unique documents are purple, and common results are green to focus on ranking changes.
+* **Ranking change color coding 2**: All unique documents are gray, and common results are green to focus on ranking changes.
+* **Venn diagram color coding**: All unique documents are purple, and common results are blue as in the Venn diagram at the top of the two result lists.
+
 ## Comparing OpenSearch search results with reranked results
 
 One use case for Compare Search Results is the comparison of raw OpenSearch results with the same results processed by a reranking application. OpenSearch currently integrates with the following two rerankers:
@@ -115,7 +135,7 @@ To try Amazon Kendra Intelligent Ranking, you must first set up the Amazon Kendr
 
 To compare search results with reranked results in OpenSearch Dashboards, enter a query in **Query 1** and enter the same query using a reranker in **Query 2**. Then compare the OpenSearch results with the reranked results.
 
-The following example demonstrates searching for the text "snacking nuts" in the `abo` index. The documents in the index contain snack descriptions in the `bullet_point` array. 
+The following example demonstrates searching for the text "snacking nuts" in the `abo` index. The documents in the index contain snack descriptions in the `bullet_point` array.
 
 <img src="{{site.url}}{{site.baseurl}}/images/kendra_query.png" alt="OpenSearch Intelligent Ranking query"/>{: .img-fluid }