Enhance DSPy GEPA notebook structure and formatting

behroozazarkhalili · behroozazarkhalili · commit 37988b40dd44 · 2025-09-30T06:29:29.000-07:00
Add author attribution and comprehensive section headers following cookbook standards:
- Include author credit with GitHub profile link
- Add descriptive markdown headers for each major section
- Update metadata with Colab GPU configuration
- Improve overall notebook organization and readability

Sections include:
- Installation and Setup
- Language Model Configuration (Ollama/OpenRouter)
- Dataset Loading and Filtering
- Dataset Preparation Functions
- Baseline Chain-of-Thought Program
- Evaluation Metric
- Baseline Evaluation
- GEPA Optimization
- Optimized Program Evaluation

The enhanced structure makes the notebook more accessible and easier to follow while maintaining consistency with other cookbook tutorials.
diff --git a/notebooks/en/dspy_gepa.ipynb b/notebooks/en/dspy_gepa.ipynb
@@ -7,6 +7,8 @@
    "source": [
     "# Optimizing Language Models with DSPy GEPA: From 42% to 64% Accuracy\n",
     "\n",
+    "_Authored by: [Behrooz Azarkhalili](https://github.com/behroozazarkhalili)_\n",
+    "\n",
     "This notebook demonstrates how to use DSPy's GEPA (Generalized Error-driven Prompt Augmentation) optimizer to improve language model performance on mathematical reasoning tasks. We'll work with the NuminaMath-1.5 dataset and show how GEPA can boost accuracy from 42% to 64% through automated prompt optimization.\n",
     "\n",
     "**What you'll learn:**\n",
@@ -24,6 +26,16 @@
     "GEPA works by analyzing errors, generating targeted feedback, and automatically refining prompts to address common failure patterns. This makes it particularly effective for complex reasoning tasks where prompt quality significantly impacts performance."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "99b369f9",
+   "metadata": {},
+   "source": [
+    "## Installation and Setup\n",
+    "\n",
+    "Install required dependencies and import libraries for DSPy, dataset processing, and model configuration."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -67,6 +79,16 @@
     "print(\"🔄 Make sure Ollama is running: ollama run qwen3:8b\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "ee1fa682",
+   "metadata": {},
+   "source": [
+    "## Language Model Configuration\n",
+    "\n",
+    "Configure your language model - either local (Ollama) or cloud-based (OpenRouter) - for use with DSPy."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -99,6 +121,16 @@
     "train_split = load_dataset(\"AI-MO/NuminaMath-1.5\")['train']"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "aca72fbc",
+   "metadata": {},
+   "source": [
+    "## Dataset Loading and Filtering\n",
+    "\n",
+    "Load the NuminaMath-1.5 dataset and filter for problems with numeric answers suitable for evaluation."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -180,6 +212,16 @@
     "    return train_set, val_set, test_set"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "e6d6b6f9",
+   "metadata": {},
+   "source": [
+    "## Dataset Preparation Functions\n",
+    "\n",
+    "Helper functions to process the dataset, split it into train/val/test sets, and preview examples."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -234,6 +276,16 @@
     "program = dspy.ChainOfThought(GenerateResponse)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "3659214d",
+   "metadata": {},
+   "source": [
+    "## Baseline Chain-of-Thought Program\n",
+    "\n",
+    "Create a simple baseline using DSPy's Chain-of-Thought module to establish initial performance."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -269,6 +321,16 @@
     "evaluate(program)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "329bacee",
+   "metadata": {},
+   "source": [
+    "## Evaluation Metric\n",
+    "\n",
+    "Define the evaluation metric to compare model predictions against ground truth answers."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -303,6 +365,16 @@
    "outputs": [],
    "source": []
   },
+  {
+   "cell_type": "markdown",
+   "id": "07134dea",
+   "metadata": {},
+   "source": [
+    "## Baseline Evaluation\n",
+    "\n",
+    "Evaluate the baseline Chain-of-Thought program to establish our starting accuracy before optimization."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -357,6 +429,16 @@
     ")\n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "e5fe6dd8",
+   "metadata": {},
+   "source": [
+    "## GEPA Optimization\n",
+    "\n",
+    "Apply GEPA optimizer with error-driven feedback to automatically improve the prompt and boost performance."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -381,6 +463,16 @@
     "print(optimized_program.predict.signature.instructions)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "74c7476f",
+   "metadata": {},
+   "source": [
+    "## Optimized Program Evaluation\n",
+    "\n",
+    "Evaluate the GEPA-optimized program to measure the improvement in accuracy and effectiveness."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -393,8 +485,13 @@
   }
  ],
  "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "gpuType": "L4",
+   "provenance": []
+  },
   "kernelspec": {
-   "display_name": "behrooz",
+   "display_name": "Python 3",
    "language": "python",
    "name": "python3"
   },
@@ -408,7 +505,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.11"
+   "version": "3.11.0"
   }
  },
  "nbformat": 4,