-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Implement schema constraints for OpenAI #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement schema constraints for OpenAI #61
Conversation
- Switch from badge.fury.io to shields.io for working PyPI badge - Convert relative paths to absolute GitHub URLs for PyPI compatibility - Bump version to 0.1.3
- Add GitHub Actions workflow for automated PyPI publishing via OIDC - Configure trusted publishing environment for verified releases - Update project metadata with proper URLs and license format - Prepare for v1.0.0 stable release with production-ready automation
- Add pylibmagic>=0.5.0 dependency for bundled libraries - Add [full] install option and pre-import handling - Update README with troubleshooting and Docker sections - Bump version to 1.0.1 Fixes google#6
Deleted an inline comment referencing the output directory in the save_annotated_documents.
…ples.md docs: clarify output_dir behavior in medication_examples.md
Prevents confusion from default `test_output/...` by explicitly saving to current directory.
docs: add output_dir="." to all save_annotated_documents examples
feat: add code formatting and linting pipeline
Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause.
Add LangExtractError base exception for centralized error handling
Fixes google#25 - Windows installation failure due to pylibmagic build requirements Breaking change: LangFunLanguageModel removed. Use GeminiLanguageModel or OllamaLanguageModel instead.
fix: Remove LangFun and pylibmagic dependencies to fix Windows installation and OpenAI SDK v1.x compatibility
- Modified save_annotated_documents to accept both pathlib.Path and string paths - Convert string paths to Path objects before calling mkdir() - This fixes the error when using output_dir='.' as shown in the README example
…-mkdir Fix save_annotated_documents to handle string paths
feat: Add OpenAI language model support
…s: (google#10) * docs: clarify output_dir behavior in medication_examples.md * Removed inline comment in medication example Deleted an inline comment referencing the output directory in the save_annotated_documents. * docs: add output_dir="." to all save_annotated_documents examples Prevents confusion from default `test_output/...` by explicitly saving to current directory. * build: add formatting & linting pipeline with pre-commit integration * style: apply pyink, isort, and pre-commit formatting * ci: enable format and lint checks in tox * Add LangExtractError base exception for centralized error handling Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause. * fix(ui): prevent current highlight border from being obscured --------- Co-authored-by: Leena Kamran <62442533+kleeena@users.noreply.github.com> Co-authored-by: Akshay Goel <akshay.k.goel@gmail.com>
- Gemini & OpenAI test suites with retry on transient errors - CI: Separate job, Python 3.11 only, skips for forks - Validates char_interval for all extractions - Multilingual test xfail (issue google#13) TODO: Remove xfail from multilingual test after tokenizer fix
- Add OpenAISchema class to generate JSON Schema compatible with OpenAI's structured outputs API - Update OpenAILanguageModel to accept and use openai_schema parameter - Configure response_format with json_schema when schema is provided - Add validation to ensure schema constraints are only used with JSON format - Update extract() function to generate OpenAI schemas when appropriate - Support LANGEXTRACT_OPENAI_API_KEY environment variable This enables use_schema_constraints=True with fence_output=False for OpenAI models when using FormatType.JSON. YAML format with schema constraints will raise a clear error.
- Add tests for OpenAILanguageModel with schema constraints - Add tests for OpenAISchema generation from examples - Add integration tests for extract() function with OpenAI - Test validation errors for YAML format and fence_output=True - Verify correct API parameters when using structured outputs
- Update OpenAI example from README - Document that schema constraints now work with JSON format - Add note about FormatType and fence_output requirements - Clarify supported models and limitations
* Add workflow_dispatch trigger to validation workflows - Enable manual triggering for check-linked-issue, check-pr-size, and validate_pr_template - Add conditional logic to ensure PR-specific steps only run on PR events - Allows maintainers to manually trigger workflows when needed * Add manual trigger to infrastructure protection workflow - Add workflow_dispatch trigger - Add conditional logic for PR-specific checks - Ensures consistency across all validation workflows
- Change from pull_request to pull_request_target in all validation workflows - This gives workflows proper permissions to add labels and comments on PRs from forks - Fixes 'Resource not accessible by integration' error (HTTP 403) - Safe because workflows only read PR metadata and don't execute PR code
Enables manual triggering of CI workflow including live API tests. This allows maintainers to run live API tests for PRs from forks where the tests would normally be skipped for security reasons.
Enables two ways to run live API tests: 1. workflow_dispatch: Manual trigger via Actions tab 2. Label trigger: Add 'ready-to-merge' label to any PR The label-based approach uses pull_request_target for security: - Runs in base repository context with access to secrets - Safely merges PR into main branch before testing - Only maintainers can trigger - Comments test results back to PR This provides a production-ready solution for testing PRs from forks while maintaining security, following patterns used by major projects.
* Add base_url to OpenAILanguageModel * Github action lint is outdated, so adapting * Adding base_url to parameterized test * Lint fixes to inference_test.py
Bug: Workflows triggered on pull_request_target but checked for pull_request, causing all validations to be skipped. Fixed: - Event condition checks now match trigger type - Add manual revalidation workflow - Enable workflow_dispatch with PR number input
Manual validation results: Size: 798 lines Run ID: 16790875474 |
- Creates visible PR checks (pass/fail status) - Shows validation errors in status description (up to 140 chars) - Links to workflow run for full details - Maintains backward compatibility with comment reporting
Manual validation results: Size: 798 lines Run ID: 16791196611 |
Manual Validation ResultsStatus: ❌ Failed
|
The workflow was comparing boolean true to string 'true', causing all validations to incorrectly show as failed even when all checks passed.
Manual Validation ResultsStatus: ✅ Passed
|
- revalidate-all-prs.sh: Triggers manual validation for all open PRs - add-size-labels.sh: Adds size labels (XS/S/M/L/XL) based on change count - add-new-checks.sh: Adds required status checks to branch protection These scripts require maintainer permissions and help manage PR workflows.
- Add type ignore comments for IPython imports - Fix return type annotation (remove unnecessary quotes) - Add _is_jupyter() to properly detect notebook environments - Replace lambda with def function for pylint compliance Fixes google#65
- Add format-check job that checks actual PR code, not merge commit - Validate formatting before expensive fork PR tests - Provide clear error messages when formatting fails Fixes false positives where incorrectly formatted PRs passed CI
Auto-updates PRs behind main, handles forks/conflicts gracefully, skips bot/draft PRs, monitors API limits
Your branch is 20 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
- Apply end-of-file and whitespace fixes to workflows
- Fix empty interval bug when newline falls at chunk boundary (issue google#71) - Add concise comment explaining the fix logic - Remove excessive/obvious comments from chunking tests - Improve test docstring to be more descriptive and professional
The exceptions.py file existed in both the root directory and langextract/ directory with identical content. This removes the duplicate from the root to avoid confusion and maintain proper package structure.
❌ Infrastructure File Protection This PR modifies protected infrastructure files:
Only repository maintainers are allowed to modify infrastructure files (including Note: If these are only formatting changes, please:
If structural changes are necessary:
For more information, see our Contributing Guidelines. |
Your branch is 23 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Your branch is 86 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Description
Implements schema constraints for OpenAI models, enabling structured outputs with JSON format without requiring output fencing.
Fixes #59
Feature
How Has This Been Tested?
Comprehensive testing has been performed through automated tests and code review:
Tests Added
1. Unit Tests (
tests/inference_test.py
)test_openai_schema_constraints_json
: Validates that schema constraints work correctly with JSON formattest_openai_schema_constraints_yaml_raises_error
: Ensures YAML format with schema constraints raises appropriate errortest_openai_with_schema_constraints
: Verifies correct API parameters are sent to OpenAI when using structured outputs2. Schema Tests (
tests/schema_test.py
)OpenAISchemaTest
class with multiple test cases:3. Integration Tests (
tests/openai_extract_test.py
)test_extract_with_openai_schema_constraints
: End-to-end test of extract() function with OpenAI schema constraintstest_extract_openai_yaml_with_schema_raises_error
: Validates error handling for unsupported YAML formattest_extract_openai_fence_output_with_schema_raises_error
: Validates error handling for fence_output=Truetest_extract_openai_without_schema_constraints
: Ensures backward compatibilityTest Coverage
Running Tests
Command:
Note: The tests use mocked OpenAI API responses to avoid requiring actual API keys during testing.
Checklist:
pylint
over the affected code.