An automated recipe parser that transforms unstructured recipe text into clean, structured JSON data using Google's LangExtract library.
This tool solves a common problem: recipes on the web and in cookbooks lack a universal digital standard. Every website formats recipes differently, often embedding them within long blog posts and personal stories. This parser extracts the essential recipe information and presents it in a clean, machine-readable format.
- Intelligent Parsing: Automatically identifies and extracts recipe components including title, description, prep/cook times, servings, ingredients, and instructions
- Structured Ingredients: Breaks down each ingredient into name, quantity, and unit of measurement
- Multiple Output Formats: Generates both JSON for programmatic use and HTML for visual verification
- Visual Verification: HTML output shows the original recipe alongside the parsed data for easy validation
- Fast Processing: Parses recipes in seconds using advanced language models
- Clone the repository:
git clone https://github.com/yourusername/recipe-parser.git
cd recipe-parser
- Create a virtual environment and install dependencies:
# Using uv (recommended)
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv pip install -e .
# Or using pip
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -e .
- Set up your API key:
# Create a .env file
cp .env.example .env
# Edit .env and add your API key
LANGEXTRACT_API_KEY=your-api-key-here
Parse a recipe from a text file:
python main.py samples/chocolate_chip_cookies.txt
python main.py [input_file] [options]
Arguments:
input_file Path to the text file containing the recipe
Options:
--output-dir PATH Directory to save output files (default: ./output)
--api-key KEY API key for the language model
--model MODEL Model to use (default: gemini-1.5-flash)
# Parse a recipe and save to custom directory
python main.py samples/banana_bread.txt --output-dir recipes/parsed
# Use a different model
python main.py samples/simple_pasta.txt --model gemini-1.5-pro
Simply copy and paste any recipe into a plain text file. The parser is designed to handle various formats:
- Blog-style recipes with stories and commentary
- Cookbook-style recipes with clear sections
- Informal recipe notes
- Recipes with unusual formatting
See the samples/
directory for examples.
The parser generates a structured JSON file with the following schema:
{
"title": "Recipe Title",
"description": "Brief description of the recipe",
"prep_time": "15 minutes",
"cook_time": "30 minutes",
"servings": "4 servings",
"ingredients": [
{
"name": "ingredient name",
"quantity": 2.0,
"unit": "cups"
}
],
"instructions": [
"Step 1 text",
"Step 2 text"
]
}
The HTML output provides:
- Formatted recipe display with clear sections
- Side-by-side view of parsed data and original text
- Responsive design for mobile and desktop viewing
- Print-friendly layout
recipe-parser/
├── main.py # Main script
├── src/
│ ├── models/
│ │ └── recipe.py # Pydantic models for recipes
│ └── extractors/
│ └── recipe_extractor.py # LangExtract integration
├── samples/ # Example recipe text files
├── output/ # Generated JSON and HTML files
├── pyproject.toml # Project configuration
└── README.md # This file
The structured output enables many possibilities:
- Shopping List Generator: Automatically create shopping lists from recipes
- Recipe Scaling: Adjust ingredient quantities for different serving sizes
- Meal Planning Apps: Import recipes into meal planning software
- Nutrition Calculators: Send ingredients to nutrition APIs
- Recipe Databases: Build searchable recipe collections
- Voice Assistants: Enable voice-guided cooking instructions
# Test with a sample recipe
python main.py samples/chocolate_chip_cookies.txt
Use VS Code's debugger with the included launch configuration:
- Open VS Code
- Set breakpoints in the code
- Press F5 to start debugging
-
"API key not provided"
- Make sure you've set
LANGEXTRACT_API_KEY
in your.env
file - Or provide it via command line:
--api-key your-key
- Make sure you've set
-
"Input file is empty"
- Ensure your text file contains the recipe content
- Check file encoding (should be UTF-8)
-
"Failed to extract recipe"
- The text might be too short (minimum ~100 characters)
- Ensure the text contains recognizable recipe elements
-
Poor extraction results
- Try using a more powerful model:
--model gemini-1.5-pro
- Ensure the recipe text is complete and well-formatted
- Try using a more powerful model:
Contributions are welcome! Please feel free to submit issues and pull requests.
MIT License - feel free to use this in your own projects!
- Built with Google's LangExtract
- Inspired by the need for better recipe management tools