This section includes scripts for generating an AI-generated dataset, merging AI-generated and human_generated dataset, and launching an evaluation interface for evaluating human-generated and AI-generated responses in order to know if AI performed as good as humand.
- Python 3.x
- Gradio library
python extract.py -i <input_file.json> -o <output_file.json>
Example:
python extract.py -i human_dataset.json -o question_groundTruth_dataset.json
python noise_pipeline.py -i <input_file.json> -o <output_file.json>
Example:
python noise_pipeline.py -i question_groundTruth_dataset.json -o noise.json
python merge_datasets.py <human_dataset.json> <noise.json> <merged_dataset.json>
Example:
python merge_datasets.py human_dataset.json noise.json evaluation_dataset.json
python evaluation_interface.py <evaluation_dataset_filename.json>
Example:
python evaluation_interface.py evaluation_dataset.json
Input File Format for extract.py The input text file should contain questions and their corresponding ground truths and answers in the following format:
question: What are the main causes of climate change?
ground_truth: Climate change is primarily caused by human activities...
a0: The primary driver of climate change is human activity...
a1: The primary driver of climate change is human activity...
...
Output File Format for extract.py The output JSON file will have the following structure:
[
{
"question": "What are the main causes of climate change?",
"ground_truth": "Climate change is primarily caused by human activities..."
},
{
"question": "How does photosynthesis work in plants?",
"ground_truth": "Photosynthesis is the process by which plants convert light energy..."
}
]
Merged Dataset Format The merged dataset will have the following structure:
{
"questions":[
{
"id":1,
"question":"What was the Castlereagh–Canning duel?",
"ground_truth":"The Castlereagh–Canning duel was a pistol duel...",
"answers":{
"A0":{
"human":"The Castlereagh–Canning duel, fought on September 21, 1809...",
"ai":"Climate change is predominantly attributed to human actions..."
},
...
}
},
...
]
}
It is possible to test the evaluation interface directly with the dataset available in this repository through this command :
python evaluation_interface.py evaluation_dataset.json