Skip to content

Commit 1c90308

Browse files
authored
Merge pull request #281 from julianeagu/patch-1
Update llm_judge_evaluating_ai_search_engines_with_judges_library.ipynb
2 parents 52e7130 + a892854 commit 1c90308

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

notebooks/en/llm_judge_evaluating_ai_search_engines_with_judges_library.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@
4747
"\n",
4848
"We use the [Natural Questions dataset](https://paperswithcode.com/dataset/natural-questions), an open-source collection of real Google queries and Wikipedia articles, to benchmark AI search engine quality.\n",
4949
"\n",
50-
"1. Start with a [**100-datapoint subset of Natural Questions**](https://huggingface.co/datasets/quotientai/natural-qa-random-100-with-AI-search-answers), which only includes human evaluated answers and their corresponding queries for correctness, clarity, and completeness. We'll use these as the ground truth answers to the queries.\n",
50+
"1. Start with a [**100-datapoint subset of Natural Questions**](https://huggingface.co/datasets/quotientai/labeled-natural-qa-random-100), which only includes human evaluated answers and their corresponding queries for correctness, clarity, and completeness. We'll use these as the ground truth answers to the queries.\n",
5151
"2. Use different **AI search engines** (Perplexity, Exa, and Gemini) to generate responses to the queries in the dataset.\n",
5252
"3. Use `judges` to evaluate the responses for **correctness** and **quality**.\n",
5353
"\n",

0 commit comments

Comments
 (0)