|
43 | 43 | },
|
44 | 44 | "outputs": [],
|
45 | 45 | "source": [
|
46 |
| - "!pip install -q torch transformers transformers langchain sentence-transformers tqdm openpyxl openai pandas datasets langchain-community ragatouille" |
| 46 | + "!pip install -q torch transformers langchain sentence-transformers tqdm openpyxl openai pandas datasets langchain-community ragatouille" |
47 | 47 | ]
|
48 | 48 | },
|
49 | 49 | {
|
|
366 | 366 | "\n",
|
367 | 367 | "We thus build critique agents that will rate each question on several criteria, given in [this paper](https://huggingface.co/papers/2312.10003):\n",
|
368 | 368 | "- **Groundedness:** can the question be answered from the given context?\n",
|
369 |
| - "- **Relevance:** is the question relevant to users? For instance, `\"What is the date when transformers 4.29.1 was released?\"` is not relevant for ML practicioners.\n", |
| 369 | + "- **Relevance:** is the question relevant to users? For instance, `\"What is the date when transformers 4.29.1 was released?\"` is not relevant for ML practitioners.\n", |
370 | 370 | "\n",
|
371 | 371 | "One last failure case we've noticed is when a function is tailored for the particular setting where the question was generated, but undecipherable by itself, like `\"What is the name of the function used in this guide?\"`.\n",
|
372 | 372 | "We also build a critique agent for this criteria:\n",
|
|
426 | 426 | "\n",
|
427 | 427 | "question_standalone_critique_prompt = \"\"\"\n",
|
428 | 428 | "You will be given a question.\n",
|
429 |
| - "Your task is to provide a 'total rating' representing how context-independant this question is.\n", |
| 429 | + "Your task is to provide a 'total rating' representing how context-independent this question is.\n", |
430 | 430 | "Give your answer on a scale of 1 to 5, where 1 means that the question depends on additional information to be understood, and 5 means that the question makes sense by itself.\n",
|
431 | 431 | "For instance, if the question refers to a particular setting, like 'in the context' or 'in the document', the rating must be 1.\n",
|
432 | 432 | "The questions can contain obscure technical nouns or acronyms like Gradio, Hub, Hugging Face or Space and still be a 5: it must simply be clear to an operator with access to documentation what the question is about.\n",
|
433 | 433 | "\n",
|
434 |
| - "For instance, \"What is the name of the checkpoint from which the ViT model is imported?\" should receive a 1, since there is an implicit mention of a context, thus the question is not independant from the context.\n", |
| 434 | + "For instance, \"What is the name of the checkpoint from which the ViT model is imported?\" should receive a 1, since there is an implicit mention of a context, thus the question is not independent from the context.\n", |
435 | 435 | "\n",
|
436 | 436 | "Provide your answer as follows:\n",
|
437 | 437 | "\n",
|
|
804 | 804 | "source": [
|
805 | 805 | "Now our synthetic evaluation dataset is complete! We can evaluate different RAG systems on this evaluation dataset.\n",
|
806 | 806 | "\n",
|
807 |
| - "We have generated only a few QA couples here to reduce time and cost. But let's kick start the next part by loading a pre-generated dataset:" |
| 807 | + "We have generated only a few QA couples here to reduce time and cost. But let's kickstart the next part by loading a pre-generated dataset:" |
808 | 808 | ]
|
809 | 809 | },
|
810 | 810 | {
|
|
1123 | 1123 | "source": [
|
1124 | 1124 | "# 3. Benchmarking the RAG system\n",
|
1125 | 1125 | "\n",
|
1126 |
| - "The RAG system and the evaluation datasets are now ready. The last step is to judge the RAG system's output on this evlauation dataset.\n", |
| 1126 | + "The RAG system and the evaluation datasets are now ready. The last step is to judge the RAG system's output on this evaluation dataset.\n", |
1127 | 1127 | "\n",
|
1128 | 1128 | "To this end, __we setup a judge agent__. ⚖️🤖\n",
|
1129 | 1129 | "\n",
|
|
1427 | 1427 | "## Example results\n",
|
1428 | 1428 | "\n",
|
1429 | 1429 | "Let us load the results that I obtained by tweaking the different options available in this notebook.\n",
|
1430 |
| - "For more detail on why these options could work on not, see the notebook on [advanced_RAG](advanced_rag).\n", |
| 1430 | + "For more detail on why these options could work or not, see the notebook on [advanced_RAG](advanced_rag).\n", |
1431 | 1431 | "\n",
|
1432 | 1432 | "As you can see in the graph below, some tweaks do not bring any improvement, some give huge performance boosts.\n",
|
1433 | 1433 | "\n",
|
|
0 commit comments