Skip to content

aswincsekar/langextract-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Recipe Parser

An automated recipe parser that transforms unstructured recipe text into clean, structured JSON data using Google's LangExtract library.

Overview

This tool solves a common problem: recipes on the web and in cookbooks lack a universal digital standard. Every website formats recipes differently, often embedding them within long blog posts and personal stories. This parser extracts the essential recipe information and presents it in a clean, machine-readable format.

Features

  • Intelligent Parsing: Automatically identifies and extracts recipe components including title, description, prep/cook times, servings, ingredients, and instructions
  • Structured Ingredients: Breaks down each ingredient into name, quantity, and unit of measurement
  • Multiple Output Formats: Generates both JSON for programmatic use and HTML for visual verification
  • Visual Verification: HTML output shows the original recipe alongside the parsed data for easy validation
  • Fast Processing: Parses recipes in seconds using advanced language models

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/recipe-parser.git
cd recipe-parser
  1. Create a virtual environment and install dependencies:
# Using uv (recommended)
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e .

# Or using pip
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .
  1. Set up your API key:
# Create a .env file
cp .env.example .env

# Edit .env and add your API key
LANGEXTRACT_API_KEY=your-api-key-here

Usage

Basic Usage

Parse a recipe from a text file:

python main.py samples/chocolate_chip_cookies.txt

Command Line Options

python main.py [input_file] [options]

Arguments:
  input_file              Path to the text file containing the recipe

Options:
  --output-dir PATH       Directory to save output files (default: ./output)
  --api-key KEY          API key for the language model
  --model MODEL          Model to use (default: gemini-1.5-flash)

Example

# Parse a recipe and save to custom directory
python main.py samples/banana_bread.txt --output-dir recipes/parsed

# Use a different model
python main.py samples/simple_pasta.txt --model gemini-1.5-pro

Input Format

Simply copy and paste any recipe into a plain text file. The parser is designed to handle various formats:

  • Blog-style recipes with stories and commentary
  • Cookbook-style recipes with clear sections
  • Informal recipe notes
  • Recipes with unusual formatting

See the samples/ directory for examples.

Output Format

JSON Output

The parser generates a structured JSON file with the following schema:

{
  "title": "Recipe Title",
  "description": "Brief description of the recipe",
  "prep_time": "15 minutes",
  "cook_time": "30 minutes",
  "servings": "4 servings",
  "ingredients": [
    {
      "name": "ingredient name",
      "quantity": 2.0,
      "unit": "cups"
    }
  ],
  "instructions": [
    "Step 1 text",
    "Step 2 text"
  ]
}

HTML Visualization

The HTML output provides:

  • Formatted recipe display with clear sections
  • Side-by-side view of parsed data and original text
  • Responsive design for mobile and desktop viewing
  • Print-friendly layout

Project Structure

recipe-parser/
├── main.py                 # Main script
├── src/
│   ├── models/
│   │   └── recipe.py      # Pydantic models for recipes
│   └── extractors/
│       └── recipe_extractor.py  # LangExtract integration
├── samples/               # Example recipe text files
├── output/               # Generated JSON and HTML files
├── pyproject.toml        # Project configuration
└── README.md            # This file

Use Cases

The structured output enables many possibilities:

  • Shopping List Generator: Automatically create shopping lists from recipes
  • Recipe Scaling: Adjust ingredient quantities for different serving sizes
  • Meal Planning Apps: Import recipes into meal planning software
  • Nutrition Calculators: Send ingredients to nutrition APIs
  • Recipe Databases: Build searchable recipe collections
  • Voice Assistants: Enable voice-guided cooking instructions

Development

Running Tests

# Test with a sample recipe
python main.py samples/chocolate_chip_cookies.txt

Debugging

Use VS Code's debugger with the included launch configuration:

  1. Open VS Code
  2. Set breakpoints in the code
  3. Press F5 to start debugging

Troubleshooting

Common Issues

  1. "API key not provided"

    • Make sure you've set LANGEXTRACT_API_KEY in your .env file
    • Or provide it via command line: --api-key your-key
  2. "Input file is empty"

    • Ensure your text file contains the recipe content
    • Check file encoding (should be UTF-8)
  3. "Failed to extract recipe"

    • The text might be too short (minimum ~100 characters)
    • Ensure the text contains recognizable recipe elements
  4. Poor extraction results

    • Try using a more powerful model: --model gemini-1.5-pro
    • Ensure the recipe text is complete and well-formatted

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

License

MIT License - feel free to use this in your own projects!

Acknowledgments

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages