Skip to content

Commit 3240f87

Browse files
jeremymanningclaude
andcommitted
Add detailed notes on notebook validation fixes and learnings
Comprehensive documentation of the notebook validation issues encountered and resolved during cloud tutorial improvements: - Root cause analysis of execution_count schema violations - Working solutions with specific code snippets - Diagnostic tools and validation strategies - Prevention strategies for future notebook editing - Complete workflow from problem discovery to resolution Reference commits: 800bc9b0d6d39e5c48828 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 5c48828 commit 3240f87

File tree

1 file changed

+228
-0
lines changed

1 file changed

+228
-0
lines changed
Lines changed: 228 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
# Notebook Validation and Fixes Session - 2025-06-26
2+
3+
## Session Summary
4+
5+
This session focused on fixing critical Jupyter notebook validation issues that arose during the cloud tutorial notebook improvements. The work involved converting print statements to markdown cells and resolving schema validation errors.
6+
7+
## Key Commits
8+
9+
- **5c48828**: Update documentation build artifacts after notebook fixes
10+
- **0d6d39e**: Fix Jupyter notebook execution_count validation issues
11+
- **800bc9b**: Convert print statements to markdown cells in cloud tutorial notebooks
12+
13+
## Problem Discovered
14+
15+
After converting print statements to markdown cells in cloud tutorial notebooks, several notebooks became invalid due to missing required `execution_count` fields in code cells.
16+
17+
### Error Message
18+
```
19+
'execution_count' is a required property
20+
```
21+
22+
### Affected Notebooks
23+
- `aws_cloud_tutorial.ipynb`: 9 code cells missing execution_count
24+
- `azure_cloud_tutorial.ipynb`: 3 code cells missing execution_count
25+
- `gcp_cloud_tutorial.ipynb`: 3 code cells missing execution_count
26+
- `huggingface_spaces_tutorial.ipynb`: 2 code cells missing execution_count
27+
- `lambda_cloud_tutorial.ipynb`: 2 code cells missing execution_count
28+
- `basic_usage.ipynb`: Had invalid `outputs` field in markdown cell
29+
30+
## Diagnostic Tools Used
31+
32+
### 1. Schema Validation Script
33+
```python
34+
def check_notebook_schema(filepath):
35+
with open(filepath, 'r') as f:
36+
nb = json.load(f)
37+
38+
errors = []
39+
40+
# Check cells for execution_count issues
41+
for i, cell in enumerate(nb.get('cells', [])):
42+
if cell.get('cell_type') == 'code':
43+
if 'execution_count' not in cell:
44+
errors.append(f'Cell {i}: Missing execution_count')
45+
elif cell['execution_count'] is not None and not isinstance(cell['execution_count'], int):
46+
errors.append(f'Cell {i}: Invalid execution_count type: {type(cell["execution_count"])}')
47+
elif cell.get('cell_type') == 'markdown':
48+
if 'execution_count' in cell:
49+
errors.append(f'Cell {i}: Markdown cell should not have execution_count')
50+
51+
return errors
52+
```
53+
54+
### 2. nbformat Validation
55+
```python
56+
import nbformat
57+
58+
with open(filename, 'r') as f:
59+
nb = nbformat.read(f, as_version=4)
60+
nbformat.validate(nb)
61+
```
62+
63+
## Solutions Applied
64+
65+
### 1. Fix Missing execution_count Fields
66+
67+
**Working Solution:**
68+
```python
69+
import json
70+
71+
# Load notebook
72+
with open(notebook_path, 'r') as f:
73+
nb = json.load(f)
74+
75+
# Fix code cells missing execution_count
76+
for cell in nb['cells']:
77+
if cell.get('cell_type') == 'code' and 'execution_count' not in cell:
78+
cell['execution_count'] = None
79+
80+
# Save corrected notebook
81+
with open(notebook_path, 'w') as f:
82+
json.dump(nb, f, indent=1)
83+
```
84+
85+
**Key Learning**: Code cells in Jupyter notebooks **must** have an `execution_count` field, even if it's `null`. This is a schema requirement.
86+
87+
### 2. Remove Invalid Fields from Markdown Cells
88+
89+
**Working Solution:**
90+
```python
91+
# Remove outputs from markdown cells (invalid)
92+
for cell in nb['cells']:
93+
if cell.get('cell_type') == 'markdown' and 'outputs' in cell:
94+
del cell['outputs']
95+
```
96+
97+
**Key Learning**: Markdown cells should **not** have `outputs` or `execution_count` fields - these are only valid for code cells.
98+
99+
### 3. Automated Fix Script
100+
101+
Created a comprehensive script that fixed all affected notebooks:
102+
103+
```python
104+
problematic_notebooks = {
105+
'huggingface_spaces_tutorial.ipynb': [4, 7],
106+
'azure_cloud_tutorial.ipynb': [6, 9, 12],
107+
'gcp_cloud_tutorial.ipynb': [6, 9, 12],
108+
'aws_cloud_tutorial.ipynb': [10, 13, 16, 19, 22, 24, 27, 30, 33],
109+
'lambda_cloud_tutorial.ipynb': [4, 9]
110+
}
111+
112+
for notebook_name, problematic_cells in problematic_notebooks.items():
113+
filepath = f'/Users/jmanning/clustrix/docs/notebooks/{notebook_name}'
114+
115+
with open(filepath, 'r') as f:
116+
nb = json.load(f)
117+
118+
fixes_applied = 0
119+
for cell_idx in problematic_cells:
120+
if cell_idx < len(nb['cells']):
121+
cell = nb['cells'][cell_idx]
122+
if cell.get('cell_type') == 'code' and 'execution_count' not in cell:
123+
cell['execution_count'] = None
124+
fixes_applied += 1
125+
126+
if fixes_applied > 0:
127+
with open(filepath, 'w') as f:
128+
json.dump(nb, f, indent=1)
129+
```
130+
131+
## Validation Tools
132+
133+
### Final Validation Commands
134+
```bash
135+
# Schema validation
136+
python -c "import json; nb=json.load(open('notebook.ipynb')); print('Valid JSON')"
137+
138+
# nbformat validation
139+
python -c "import nbformat; nb=nbformat.read('notebook.ipynb', 4); nbformat.validate(nb); print('Valid notebook')"
140+
141+
# Sphinx documentation test
142+
python -m sphinx -b html docs/source docs/build/html -q
143+
```
144+
145+
## Root Cause Analysis
146+
147+
The issue occurred because the `NotebookEdit` tool, when inserting new markdown cells, didn't properly handle the schema requirements for adjacent code cells. When cells were renumbered or modified during the editing process, some code cells lost their `execution_count` field.
148+
149+
### Prevention Strategy
150+
Always validate notebooks after programmatic editing:
151+
152+
```python
153+
import nbformat
154+
155+
def validate_notebook_after_edit(filepath):
156+
"""Validate notebook and fix common issues after editing."""
157+
with open(filepath, 'r') as f:
158+
nb = json.load(f)
159+
160+
# Ensure all code cells have execution_count
161+
for cell in nb['cells']:
162+
if cell.get('cell_type') == 'code' and 'execution_count' not in cell:
163+
cell['execution_count'] = None
164+
165+
# Remove invalid fields from markdown cells
166+
for cell in nb['cells']:
167+
if cell.get('cell_type') == 'markdown':
168+
if 'execution_count' in cell:
169+
del cell['execution_count']
170+
if 'outputs' in cell:
171+
del cell['outputs']
172+
173+
# Save and validate
174+
with open(filepath, 'w') as f:
175+
json.dump(nb, f, indent=1)
176+
177+
# Final validation
178+
nb_format = nbformat.read(filepath, as_version=4)
179+
nbformat.validate(nb_format)
180+
181+
return True
182+
```
183+
184+
## Documentation Impact
185+
186+
After fixes:
187+
- ✅ All 12 notebooks pass nbformat validation
188+
- ✅ Sphinx documentation compiles successfully
189+
- ✅ Notebooks work correctly in JupyterLab, Jupyter Notebook, and Google Colab
190+
- ✅ Cloud tutorial notebooks maintain improved structure with markdown instruction cells
191+
192+
## Key Learnings
193+
194+
1. **Jupyter Schema is Strict**: Every code cell MUST have `execution_count`, even if null
195+
2. **Markdown Cells are Limited**: Should only have `cell_type`, `metadata`, and `source` fields
196+
3. **Programmatic Editing Risks**: Always validate after programmatic notebook modifications
197+
4. **Tool Integration**: nbformat validation is essential for any notebook editing workflow
198+
199+
## Testing Strategy for Future
200+
201+
```python
202+
# Add to CI/testing pipeline
203+
def test_all_notebooks_valid():
204+
"""Ensure all notebooks in docs/notebooks are valid."""
205+
import os
206+
import nbformat
207+
208+
notebook_dir = 'docs/notebooks'
209+
for filename in os.listdir(notebook_dir):
210+
if filename.endswith('.ipynb'):
211+
filepath = os.path.join(notebook_dir, filename)
212+
213+
# Load and validate
214+
nb = nbformat.read(filepath, as_version=4)
215+
nbformat.validate(nb) # Will raise exception if invalid
216+
217+
print(f"{filename}: Valid")
218+
```
219+
220+
## Final Status
221+
222+
All notebook validation issues have been resolved. The cloud tutorial notebooks now provide an improved user experience with:
223+
- Clear separation of instructional content (markdown) and executable code
224+
- Full compliance with Jupyter notebook schema
225+
- Successful integration with documentation build system
226+
- Compatibility with all major Jupyter environments
227+
228+
**Commits**: 800bc9b → 0d6d39e → 5c48828

0 commit comments

Comments
 (0)