Data Processing

Data Processing Agent + Ingredient Nutrient map

Data sources

Recipe dataset: https://eightportions.com/datasets/Recipes/#fn:1
Ingredient CNF API: https://produits-sante.canada.ca/api/documentation/cnf-documentation-en.html#a6

Code Folders

DataPrcoessing_1.ipynb: The main notebook for Recipe data cleaning + Ingredient Map + Data Processign Agent
Helpers folder: All helper functions
recipes_raw folder:
- All raw recipe datasets
- the Good_ingredient_List.csv: the ingredient list we extract from Food Basic webiste based on popularity
- recipes_raw_processed.json: Processed recipe that only contains ingredients in the Good ingredient list
datasets folder:
- recipe_dataset_init_{}.json: Randomly selected small testing dataset from recipes_raw_processed.json. The number means number of items.
- ./Processed_Recipes/processed_recipes_init_{number of corresponding recipes before process}_ batch_: Initially processed recipe datasets. Batch size 50, batch files are indexed in asc order.
- CNF_API_food_code.json: Ingredient food code dataset from CNF
- emb folder: the processed embedding and faiss index for descriptions in the CNF_API_food_code.json
- testing folder:
  - tuning_ingre_list.csv: 80% of the Good ingredient list
  - test_ingre_list.csv: 20% of Good ingredient list
ingre_nutrition_map: Where the map and the unit map is stored

Main Code Walkthrough

Please read the DataPrcoessing_1.ipynb about how to use them. Following are some helper functions if you just want to run the process end to end.

To have a smaller dataset for testing

See this example test function recipe_dataset_gen.test_get_testing_dataset, and use the functions called inside

To process the paragraph style recipes into structured labels

See main notebook get_processed_recipe_dataset()

To get nutrient mapping for the processed recipes:

See main notebook Ingredient-Nutrient Mapping section. The major functions are:

get_food_code_for_ingredients()
get_all_ingredient_mapping()
food_nutrient_mapping_helpder.save_nut_map

Dataset Filtering

Data source

Processed recipe dataset: ./datasets/Processed_Recipes
Ingredient-nutrient map: ./ingre_nutrition_map

Filtering criteria

Health Score: Quantifies the nutrient balance of each recipe, (detailed in the Evaluation Section). Recipes with a health score below 3 are filtered out.

Code

RAG_health_score_ver5.ipynb: The notebook for calculating the health score and adding it as an attribution in the recipe JSON file.
RAG_health_score.py: The Python functions inplementing the health score algorithm.
recipe_filter.ipynb: The notebook for filtering recipes and merging balanced recipes in a JSON file.

Main Code Walkthrough

To calculate recipes' health score

Use the major function get_health_score_with_rag() in RAG_health_score.py.

Examples Usages in the same file:

For a single recipes
For multiple recipes stored in a JSON file.

For a detailed algorithm flow and output examples, refer to RAG_health_score_ver5.ipynb.

To filter recipes

Use filter() in the recipe_filter.ipynb notebook.

Filtered Dataset

Filtered Recipes: ./datasets/filtered_recipes_419.json
Contains 419 balanced recipes with a health score of 3 or higher.

Evaluation

Criteria

Health Score: Measures the nutrient balance of a recipe based on the WHO Nutrient Intake Goals. Seven macronutrients are considered:
- Proteins
- Carbohydrates
- Sugars
- Sodium
- Fats
- Saturated Fats
- Fibres
Each macronutrient that falls within the recommended range scores 1 point. The total score (out of 7) is the recipe’s health score.
Relevance: Evaluates how well the recipe meets user requirements, including:
- Cooking Tools
- Cooking Time
- Ingredient Similarity
Consistency: Assesses the quality of the generated recipe based on:
- Instructional Clarity
- Measurement Consistency
- Logical Step Sequencing

Code

RAG_health_score.py: The Python function for health score calculation algorithm.
recipe_relevance_ver3.py: The Python function for relevance evaluation algorithm.

Main Code Walkthrough

To calculate recipes' health score

Use the major function get_health_score_with_rag() in RAG_health_score.py.

To evaluate recipes' relevance

Use the major function relevance_evaluation() in recipe_relevance_ver3.py. Refer to the example in the file for guidance on function usage.

Evaluation Function Output

get_health_score_with_rag(): Returns the health score and a summary of points for the input recipe.

Example Output:

{
    "total_health_score": 3,
    "summary_of_points": {
        "Proteins": 0, 
        "Carbohydrates": 0, 
        "Sugars": 1, 
        "Sodium": 0, 
        "Fats": 0, 
        "Saturated Fats": 1, 
        "Fibers": 1
    }
}

relevance_evaluation(): Returns the relevance evaluation results for the input recipe.

Example Output:
```
{
    "cooking_tools": "True", 
    "cooking_time": 0, 
    "ingredient_overlap_rate": 66.66666666666666
}
```
Explanation:
- cooking_tools:
  - True if the recipe meets the tool requirement; otherwise, False.
- cooking_time:
  - 0 if the cooking time is within the user's limit.
  - Positive value indicating the exceeded time (in minutes).
- ingredient_overlap_rate:
  - The overlap rate of user inputs and recipe ingredients.
  - 100% means all ingredients needed for the recipe can be found in user's inputs.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Archive		Archive
Helpers		Helpers
__pycache__		__pycache__
datasets		datasets
ingre_nutrition_map		ingre_nutrition_map
recipes_raw		recipes_raw
tests		tests
DataPrcoessing_1.ipynb		DataPrcoessing_1.ipynb
RAG_health_score.py		RAG_health_score.py
RAG_health_score_ver5.ipynb		RAG_health_score_ver5.ipynb
README.md		README.md
RecipeGenerator_ver4.ipynb		RecipeGenerator_ver4.ipynb
RecipeGenerator_ver7_UI.ipynb		RecipeGenerator_ver7_UI.ipynb
SinglePrompt_v1.ipynb		SinglePrompt_v1.ipynb
recipe_filter.ipynb		recipe_filter.ipynb
recipe_relevance_ver3.py		recipe_relevance_ver3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Processing

Data Processing Agent + Ingredient Nutrient map

Data sources

Code Folders

Main Code Walkthrough

To have a smaller dataset for testing

To process the paragraph style recipes into structured labels

To get nutrient mapping for the processed recipes:

Dataset Filtering

Data source

Filtering criteria

Code

Main Code Walkthrough

To calculate recipes' health score

To filter recipes

Filtered Dataset

Evaluation

Criteria

Code

Main Code Walkthrough

To calculate recipes' health score

To evaluate recipes' relevance

Evaluation Function Output

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

ece1786-2024/RecipePrep

Folders and files

Latest commit

History

Repository files navigation

Data Processing

Data Processing Agent + Ingredient Nutrient map

Data sources

Code Folders

Main Code Walkthrough

To have a smaller dataset for testing

To process the paragraph style recipes into structured labels

To get nutrient mapping for the processed recipes:

Dataset Filtering

Data source

Filtering criteria

Code

Main Code Walkthrough

To calculate recipes' health score

To filter recipes

Filtered Dataset

Evaluation

Criteria

Code

Main Code Walkthrough

To calculate recipes' health score

To evaluate recipes' relevance

Evaluation Function Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages