Replies: 4 comments 1 reply
-
🤖 Hello @snassimr! It's good to see you again. I hope you're doing well! Let's see how we can assist you with this issue today. Give me a moment to look at your issue. I'll post a detailed answer shortly. |
Beta Was this translation helpful? Give feedback.
-
🤖 The parameter 'num=20' in the 'agenerate_dataset_from_nodes' function signifies the maximum number of question-answer pairs to be generated from the nodes. If the 'num' parameter is set to a specific integer, the function will stop generating new question-answer pairs once it reaches that number. If 'num' is set to None, the function will generate question-answer pairs for all nodes. This is evident from the following code snippet in the LlamaIndex codebase: for node in nodes:
if num is not None and len(queries) >= num:
break Here, the function iterates over each node to generate question-answer pairs. If 'num' is not None and the length of the 'queries' dictionary (which stores the generated question-answer pairs) is greater than or equal to 'num', the function breaks out of the loop, effectively limiting the number of generated question-answer pairs to 'num'. In the context of your code, even though you have set 'num_questions_per_chunk' to 10 and you are generating questions for 3 nodes, the 'num=20' parameter in 'agenerate_dataset_from_nodes' function will limit the total number of question-answer pairs to 20. You can find more details about this in the llama_index/evaluation/dataset_generation.py file in the LlamaIndex repository. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
🤖 In the code snippet you provided, the The function will iterate over each node and generate question-answer pairs. If the 'num' parameter is set to a specific integer, the function will stop generating new question-answer pairs once it reaches that number. If 'num' is set to None, the function will generate question-answer pairs for all nodes. This is evident from the following code snippet: for node in nodes:
if num is not None and len(queries) >= num:
break Here, the function breaks out of the loop if the length of the 'queries' dictionary (which stores the generated question-answer pairs) is greater than or equal to 'num', effectively limiting the number of generated question-answer pairs to 'num'. You can find more details in the source code of the I hope this clarifies your question. If you have any more questions, feel free to ask. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
Hey @snassimr! So what's happening is that each for each node, there is an LLM that is prompted to generate |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Question Validation
Question
I want to ask regarding this peace of code :
`for file in files:
nodes = node_parser.get_nodes_from_documents(documents[file])
dataset_generator = DatasetGenerator(
nodes[:3],
service_context=service_context,
show_progress=True,
num_questions_per_chunk=10,
)
Each chunk (I have 77 of them) is mapped to node .
The code should generate 30 q/a pairs (number of nodes (3) X num_questions_per_chunk (1) .
What the meaniing of num=20 in this case ?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions