Skip to content

Commit b287fc6

Browse files
ianhiclaudedcherian
authored
Create mechanism for checking docstring consistency with xarray (#1275)
* Add script to check xarray documentation consistency This script compares parameter documentation between icechunk.xarray.to_icechunk and xarray.Dataset.to_zarr to ensure they stay in sync. Features: - Automatically detects all parameters from xarray's to_zarr function - Excludes known parameters not applicable to icechunk (store, compute, etc.) - Shows side-by-side comparison with color-coded diffs - Character-level highlighting for similar lines - Summary report of checked/ignored/missing parameters Usage (from icechunk-python directory): export XARRAY_DIR=~/Documents/dev/xarray uv run scripts/check_xarray_docs_sync.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Refactor documentation sync script for better readability Extract helper functions to reduce complexity: - highlight_line_with_char_diff: Character-level diff highlighting - build_diff_text: Build rich Text with highlighted differences - create_comparison_table: Create side-by-side comparison table This reduces the main compare_docs function from ~110 lines to ~50 lines and makes the code more maintainable. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add diff hash mechanism for tracking known documentation differences Features: - Compute SHA256 hash of diffs to track documentation differences - Store known acceptable diffs in .known-xarray-doc-diffs.json - --update-known-diffs flag to add/update diff hashes - Known diffs show warning but exit with success (for CI) - Unknown diffs show error and exit with failure This allows CI to pass when only known/acceptable differences exist (like :py:func: formatting), while still displaying the diffs for review. Usage: # Initial setup - mark current diffs as known uv run scripts/check_xarray_docs_sync.py --update-known-diffs # CI mode - fail only on unknown diffs uv run scripts/check_xarray_docs_sync.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Move known diffs config to scripts/ and update documentation Changes: - Move known-xarray-doc-diffs.json to scripts/ directory (not hidden) - Update default path in script to scripts/known-xarray-doc-diffs.json - Document the known diffs mechanism in contributing.md - Explain how to update known diffs and integrate with CI The known diffs file is now version-controlled and visible, making it easier to review and maintain. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add xarray documentation consistency check to CI Add a new GitHub Actions job 'check-xarray-docs' that: - Checks out both icechunk and xarray repositories - Runs the documentation consistency checker - Fails if unknown documentation differences are found - Passes if only known differences exist (tracked in known-xarray-doc-diffs.json) This ensures documentation stays in sync with xarray's to_zarr function while allowing acceptable differences like Sphinx formatting. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * intentionally break hash to check action * correct description * correct hash --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
1 parent 4cfa6b8 commit b287fc6

File tree

4 files changed

+744
-0
lines changed

4 files changed

+744
-0
lines changed

.github/workflows/python-check.yaml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,3 +185,31 @@ jobs:
185185
# pass xarray's pyproject.toml so that pytest can find the `flaky` fixture
186186
source .venv/bin/activate
187187
python -m pytest -c=../../xarray/pyproject.toml -W ignore --override-ini="strict_markers=false" tests/run_xarray_backends_tests.py
188+
189+
check-xarray-docs:
190+
runs-on: ubuntu-latest
191+
steps:
192+
- uses: actions/checkout@v5
193+
with:
194+
path: "icechunk"
195+
196+
- uses: actions/checkout@v5
197+
with:
198+
repository: "pydata/xarray"
199+
path: "xarray"
200+
fetch-depth: 0
201+
202+
- name: Install uv
203+
uses: astral-sh/setup-uv@v6
204+
with:
205+
enable-cache: true
206+
python-version: ${{ env.PYTHON_VERSION }}
207+
208+
- name: Check xarray documentation consistency
209+
shell: bash
210+
working-directory: icechunk/icechunk-python
211+
env:
212+
XARRAY_DIR: ../../xarray
213+
run: |
214+
set -e
215+
uv run scripts/check_xarray_docs_sync.py

docs/docs/contributing.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,35 @@ python -m pytest -xvs tests/run_xarray_backends_tests.py::TestIcechunkStoreFiles
125125
--override-ini="addopts="
126126
```
127127

128+
#### Checking Xarray Documentation Consistency
129+
130+
Icechunk's `to_icechunk` function shares several parameters with Xarray's `to_zarr` function. To ensure documentation stays in sync, use the documentation checker script.
131+
132+
From the `icechunk-python` directory:
133+
134+
```bash
135+
# Set XARRAY_DIR to point to your local Xarray clone
136+
export XARRAY_DIR=~/Documents/dev/xarray
137+
138+
# Run the documentation consistency check
139+
uv run scripts/check_xarray_docs_sync.py
140+
```
141+
142+
The script will display a side-by-side comparison of any documentation differences, with missing text highlighted in red.
143+
144+
**Known Differences**: Some differences are acceptable (e.g., Sphinx formatting like `:py:func:` doesn't work in mkdocs). These are tracked in `scripts/known-xarray-doc-diffs.json`. Known differences are displayed but don't cause the check to fail.
145+
146+
**Updating Known Differences**: After making intentional documentation changes, update the known diffs file:
147+
148+
```bash
149+
# Mark current diffs as known (creates/updates scripts/known-xarray-doc-diffs.json)
150+
uv run scripts/check_xarray_docs_sync.py --update-known-diffs
151+
152+
# Edit scripts/known-xarray-doc-diffs.json to add reasons for each difference
153+
```
154+
155+
**CI Integration**: The script returns exit code 0 if only known differences exist, allowing CI to pass while still displaying diffs for review.
156+
128157
### Rust Development Workflow
129158

130159
#### Prerequisites

0 commit comments

Comments
 (0)