Merge pull request #281 from julianeagu/patch-1

stevhliu · web-flow · commit 1c90308a28ea · 2025-02-04T11:02:18.000-08:00
Update llm_judge_evaluating_ai_search_engines_with_judges_library.ipynb
diff --git a/notebooks/en/llm_judge_evaluating_ai_search_engines_with_judges_library.ipynb b/notebooks/en/llm_judge_evaluating_ai_search_engines_with_judges_library.ipynb
@@ -47,7 +47,7 @@
     "\n",
     "We use the [Natural Questions dataset](https://paperswithcode.com/dataset/natural-questions), an open-source collection of real Google queries and Wikipedia articles, to benchmark AI search engine quality.\n",
     "\n",
-    "1. Start with a [**100-datapoint subset of Natural Questions**](https://huggingface.co/datasets/quotientai/natural-qa-random-100-with-AI-search-answers), which only includes human evaluated answers and their corresponding queries for correctness, clarity, and completeness. We'll use these as the ground truth answers to the queries.\n",
+    "1. Start with a [**100-datapoint subset of Natural Questions**](https://huggingface.co/datasets/quotientai/labeled-natural-qa-random-100), which only includes human evaluated answers and their corresponding queries for correctness, clarity, and completeness. We'll use these as the ground truth answers to the queries.\n",
     "2. Use different **AI search engines** (Perplexity, Exa, and Gemini) to generate responses to the queries in the dataset.\n",
     "3. Use `judges` to evaluate the responses for **correctness** and **quality**.\n",
     "\n",