- Recipe dataset: https://eightportions.com/datasets/Recipes/#fn:1
- Ingredient CNF API: https://produits-sante.canada.ca/api/documentation/cnf-documentation-en.html#a6
- DataPrcoessing_1.ipynb: The main notebook for Recipe data cleaning + Ingredient Map + Data Processign Agent
- Helpers folder: All helper functions
- recipes_raw folder:
- All raw recipe datasets
- the Good_ingredient_List.csv: the ingredient list we extract from Food Basic webiste based on popularity
- recipes_raw_processed.json: Processed recipe that only contains ingredients in the Good ingredient list
- datasets folder:
- recipe_dataset_init_{}.json: Randomly selected small testing dataset from recipes_raw_processed.json. The number means number of items.
- ./Processed_Recipes/processed_recipes_init_{number of corresponding recipes before process}_ batch_: Initially processed recipe datasets. Batch size 50, batch files are indexed in asc order.
- CNF_API_food_code.json: Ingredient food code dataset from CNF
- emb folder: the processed embedding and faiss index for descriptions in the CNF_API_food_code.json
- testing folder:
- tuning_ingre_list.csv: 80% of the Good ingredient list
- test_ingre_list.csv: 20% of Good ingredient list
- ingre_nutrition_map: Where the map and the unit map is stored
Please read the DataPrcoessing_1.ipynb about how to use them. Following are some helper functions if you just want to run the process end to end.
See this example test function recipe_dataset_gen.test_get_testing_dataset
, and use the functions called inside
See main notebook get_processed_recipe_dataset()
See main notebook Ingredient-Nutrient Mapping section. The major functions are:
- get_food_code_for_ingredients()
- get_all_ingredient_mapping()
- food_nutrient_mapping_helpder.save_nut_map
- Processed recipe dataset:
./datasets/Processed_Recipes
- Ingredient-nutrient map:
./ingre_nutrition_map
- Health Score: Quantifies the nutrient balance of each recipe, (detailed in the Evaluation Section). Recipes with a health score below 3 are filtered out.
RAG_health_score_ver5.ipynb
: The notebook for calculating the health score and adding it as an attribution in the recipe JSON file.RAG_health_score.py
: The Python functions inplementing the health score algorithm.recipe_filter.ipynb
: The notebook for filtering recipes and merging balanced recipes in a JSON file.
Use the major function get_health_score_with_rag()
in RAG_health_score.py
.
Examples Usages in the same file:
- For a single recipes
- For multiple recipes stored in a JSON file.
For a detailed algorithm flow and output examples, refer to RAG_health_score_ver5.ipynb
.
Use filter()
in the recipe_filter.ipynb
notebook.
- Filtered Recipes:
./datasets/filtered_recipes_419.json
Contains 419 balanced recipes with a health score of 3 or higher.
-
Health Score: Measures the nutrient balance of a recipe based on the WHO Nutrient Intake Goals. Seven macronutrients are considered:
- Proteins
- Carbohydrates
- Sugars
- Sodium
- Fats
- Saturated Fats
- Fibres
Each macronutrient that falls within the recommended range scores 1 point. The total score (out of 7) is the recipe’s health score.
-
Relevance: Evaluates how well the recipe meets user requirements, including:
- Cooking Tools
- Cooking Time
- Ingredient Similarity
-
Consistency: Assesses the quality of the generated recipe based on:
- Instructional Clarity
- Measurement Consistency
- Logical Step Sequencing
RAG_health_score.py
: The Python function for health score calculation algorithm.recipe_relevance_ver3.py
: The Python function for relevance evaluation algorithm.
Use the major function get_health_score_with_rag()
in RAG_health_score.py
.
Use the major function relevance_evaluation()
in recipe_relevance_ver3.py
.
Refer to the example in the file for guidance on function usage.
-
get_health_score_with_rag()
: Returns the health score and a summary of points for the input recipe.Example Output:
{ "total_health_score": 3, "summary_of_points": { "Proteins": 0, "Carbohydrates": 0, "Sugars": 1, "Sodium": 0, "Fats": 0, "Saturated Fats": 1, "Fibers": 1 } }
-
relevance_evaluation()
: Returns the relevance evaluation results for the input recipe.Example Output:
{ "cooking_tools": "True", "cooking_time": 0, "ingredient_overlap_rate": 66.66666666666666 }
Explanation:
cooking_tools
:True
if the recipe meets the tool requirement; otherwise,False
.
cooking_time
:0
if the cooking time is within the user's limit.- Positive value indicating the exceeded time (in minutes).
ingredient_overlap_rate
:- The overlap rate of user inputs and recipe ingredients.
- 100% means all ingredients needed for the recipe can be found in user's inputs.