|
43 | 43 | }, |
44 | 44 | "outputs": [], |
45 | 45 | "source": [ |
46 | | - "!pip install -q torch transformers transformers langchain sentence-transformers tqdm openpyxl openai pandas datasets langchain-community ragatouille" |
| 46 | + "!pip install -q torch transformers langchain sentence-transformers tqdm openpyxl openai pandas datasets langchain-community ragatouille" |
47 | 47 | ] |
48 | 48 | }, |
49 | 49 | { |
|
366 | 366 | "\n", |
367 | 367 | "We thus build critique agents that will rate each question on several criteria, given in [this paper](https://huggingface.co/papers/2312.10003):\n", |
368 | 368 | "- **Groundedness:** can the question be answered from the given context?\n", |
369 | | - "- **Relevance:** is the question relevant to users? For instance, `\"What is the date when transformers 4.29.1 was released?\"` is not relevant for ML practicioners.\n", |
| 369 | + "- **Relevance:** is the question relevant to users? For instance, `\"What is the date when transformers 4.29.1 was released?\"` is not relevant for ML practitioners.\n", |
370 | 370 | "\n", |
371 | 371 | "One last failure case we've noticed is when a function is tailored for the particular setting where the question was generated, but undecipherable by itself, like `\"What is the name of the function used in this guide?\"`.\n", |
372 | 372 | "We also build a critique agent for this criteria:\n", |
|
426 | 426 | "\n", |
427 | 427 | "question_standalone_critique_prompt = \"\"\"\n", |
428 | 428 | "You will be given a question.\n", |
429 | | - "Your task is to provide a 'total rating' representing how context-independant this question is.\n", |
| 429 | + "Your task is to provide a 'total rating' representing how context-independent this question is.\n", |
430 | 430 | "Give your answer on a scale of 1 to 5, where 1 means that the question depends on additional information to be understood, and 5 means that the question makes sense by itself.\n", |
431 | 431 | "For instance, if the question refers to a particular setting, like 'in the context' or 'in the document', the rating must be 1.\n", |
432 | 432 | "The questions can contain obscure technical nouns or acronyms like Gradio, Hub, Hugging Face or Space and still be a 5: it must simply be clear to an operator with access to documentation what the question is about.\n", |
433 | 433 | "\n", |
434 | | - "For instance, \"What is the name of the checkpoint from which the ViT model is imported?\" should receive a 1, since there is an implicit mention of a context, thus the question is not independant from the context.\n", |
| 434 | + "For instance, \"What is the name of the checkpoint from which the ViT model is imported?\" should receive a 1, since there is an implicit mention of a context, thus the question is not independent from the context.\n", |
435 | 435 | "\n", |
436 | 436 | "Provide your answer as follows:\n", |
437 | 437 | "\n", |
|
804 | 804 | "source": [ |
805 | 805 | "Now our synthetic evaluation dataset is complete! We can evaluate different RAG systems on this evaluation dataset.\n", |
806 | 806 | "\n", |
807 | | - "We have generated only a few QA couples here to reduce time and cost. But let's kick start the next part by loading a pre-generated dataset:" |
| 807 | + "We have generated only a few QA couples here to reduce time and cost. But let's kickstart the next part by loading a pre-generated dataset:" |
808 | 808 | ] |
809 | 809 | }, |
810 | 810 | { |
|
1123 | 1123 | "source": [ |
1124 | 1124 | "# 3. Benchmarking the RAG system\n", |
1125 | 1125 | "\n", |
1126 | | - "The RAG system and the evaluation datasets are now ready. The last step is to judge the RAG system's output on this evlauation dataset.\n", |
| 1126 | + "The RAG system and the evaluation datasets are now ready. The last step is to judge the RAG system's output on this evaluation dataset.\n", |
1127 | 1127 | "\n", |
1128 | 1128 | "To this end, __we setup a judge agent__. ⚖️🤖\n", |
1129 | 1129 | "\n", |
|
1427 | 1427 | "## Example results\n", |
1428 | 1428 | "\n", |
1429 | 1429 | "Let us load the results that I obtained by tweaking the different options available in this notebook.\n", |
1430 | | - "For more detail on why these options could work on not, see the notebook on [advanced_RAG](advanced_rag).\n", |
| 1430 | + "For more detail on why these options could work or not, see the notebook on [advanced_RAG](advanced_rag).\n", |
1431 | 1431 | "\n", |
1432 | 1432 | "As you can see in the graph below, some tweaks do not bring any improvement, some give huge performance boosts.\n", |
1433 | 1433 | "\n", |
|
0 commit comments