Recipe Parser

An automated recipe parser that transforms unstructured recipe text into clean, structured JSON data using Google's LangExtract library.

Overview

This tool solves a common problem: recipes on the web and in cookbooks lack a universal digital standard. Every website formats recipes differently, often embedding them within long blog posts and personal stories. This parser extracts the essential recipe information and presents it in a clean, machine-readable format.

Features

Intelligent Parsing: Automatically identifies and extracts recipe components including title, description, prep/cook times, servings, ingredients, and instructions
Structured Ingredients: Breaks down each ingredient into name, quantity, and unit of measurement
Multiple Output Formats: Generates both JSON for programmatic use and HTML for visual verification
Visual Verification: HTML output shows the original recipe alongside the parsed data for easy validation
Fast Processing: Parses recipes in seconds using advanced language models

Installation

Clone the repository:

git clone https://github.com/yourusername/recipe-parser.git
cd recipe-parser

Create a virtual environment and install dependencies:

# Using uv (recommended)
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e .

# Or using pip
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .

Set up your API key:

# Create a .env file
cp .env.example .env

# Edit .env and add your API key
LANGEXTRACT_API_KEY=your-api-key-here

Usage

Basic Usage

Parse a recipe from a text file:

python main.py samples/chocolate_chip_cookies.txt

Command Line Options

python main.py [input_file] [options]

Arguments:
  input_file              Path to the text file containing the recipe

Options:
  --output-dir PATH       Directory to save output files (default: ./output)
  --api-key KEY          API key for the language model
  --model MODEL          Model to use (default: gemini-1.5-flash)

Example

# Parse a recipe and save to custom directory
python main.py samples/banana_bread.txt --output-dir recipes/parsed

# Use a different model
python main.py samples/simple_pasta.txt --model gemini-1.5-pro

Input Format

Simply copy and paste any recipe into a plain text file. The parser is designed to handle various formats:

Blog-style recipes with stories and commentary
Cookbook-style recipes with clear sections
Informal recipe notes
Recipes with unusual formatting

See the samples/ directory for examples.

Output Format

JSON Output

The parser generates a structured JSON file with the following schema:

{
  "title": "Recipe Title",
  "description": "Brief description of the recipe",
  "prep_time": "15 minutes",
  "cook_time": "30 minutes",
  "servings": "4 servings",
  "ingredients": [
    {
      "name": "ingredient name",
      "quantity": 2.0,
      "unit": "cups"
    }
  ],
  "instructions": [
    "Step 1 text",
    "Step 2 text"
  ]
}

HTML Visualization

The HTML output provides:

Formatted recipe display with clear sections
Side-by-side view of parsed data and original text
Responsive design for mobile and desktop viewing
Print-friendly layout

Project Structure

recipe-parser/
├── main.py                 # Main script
├── src/
│   ├── models/
│   │   └── recipe.py      # Pydantic models for recipes
│   └── extractors/
│       └── recipe_extractor.py  # LangExtract integration
├── samples/               # Example recipe text files
├── output/               # Generated JSON and HTML files
├── pyproject.toml        # Project configuration
└── README.md            # This file

Use Cases

The structured output enables many possibilities:

Shopping List Generator: Automatically create shopping lists from recipes
Recipe Scaling: Adjust ingredient quantities for different serving sizes
Meal Planning Apps: Import recipes into meal planning software
Nutrition Calculators: Send ingredients to nutrition APIs
Recipe Databases: Build searchable recipe collections
Voice Assistants: Enable voice-guided cooking instructions

Development

Running Tests

# Test with a sample recipe
python main.py samples/chocolate_chip_cookies.txt

Debugging

Use VS Code's debugger with the included launch configuration:

Open VS Code
Set breakpoints in the code
Press F5 to start debugging

Troubleshooting

Common Issues

"API key not provided"
- Make sure you've set LANGEXTRACT_API_KEY in your .env file
- Or provide it via command line: --api-key your-key
"Input file is empty"
- Ensure your text file contains the recipe content
- Check file encoding (should be UTF-8)
"Failed to extract recipe"
- The text might be too short (minimum ~100 characters)
- Ensure the text contains recognizable recipe elements
Poor extraction results
- Try using a more powerful model: --model gemini-1.5-pro
- Ensure the recipe text is complete and well-formatted

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

License

MIT License - feel free to use this in your own projects!

Acknowledgments

Built with Google's LangExtract
Inspired by the need for better recipe management tools

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
samples		samples
src		src
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
article.md		article.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Recipe Parser

Overview

Features

Installation

Usage

Basic Usage

Command Line Options

Example

Input Format

Output Format

JSON Output

HTML Visualization

Project Structure

Use Cases

Development

Running Tests

Debugging

Troubleshooting

Common Issues

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

aswincsekar/langextract-demo

Folders and files

Latest commit

History

Repository files navigation

Recipe Parser

Overview

Features

Installation

Usage

Basic Usage

Command Line Options

Example

Input Format

Output Format

JSON Output

HTML Visualization

Project Structure

Use Cases

Development

Running Tests

Debugging

Troubleshooting

Common Issues

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages