Skip to content

Conversation

aksg87
Copy link
Collaborator

@aksg87 aksg87 commented Aug 13, 2025

Description

Refactors the schema constraint system to enable provider-specific implementations through a plugin architecture. Each LLM provider can now define its own optimal approach for structured output generation while maintaining backward compatibility.

Related to #128, #59, #99

Feature

How Has This Been Tested?

$ pytest tests/extract_schema_integration_test.py -v
$ pytest tests/factory_schema_test.py -v
$ pytest tests/provider_schema_test.py -v
$ tox -e live-api
$ tox -e ollama-integration

Verified backward compatibility with existing providers and tested schema generation with multiple providers.

Checklist:

  • I have read and acknowledged Google's Open Source
    Code of conduct.
  • I have read the
    Contributing
    page, and I either signed the Google
    Individual CLA
    or am covered by my company's
    Corporate CLA.
  • I have discussed my proposed solution with code owners in the linked
    issue(s) and we have agreed upon the general approach.
  • I have made any needed documentation changes, or noted in the linked
    issue(s) that documentation elsewhere needs updating.
  • I have added tests, or I have ensured existing tests cover the changes
  • I have followed
    Google's Python Style Guide
    and ran pylint over the affected code.

Key Changes

Schema Abstraction (langextract/schema.py)

  • BaseSchema abstract class for provider-specific implementations
  • FormatModeSchema for JSON/YAML providers (OpenAI, Ollama)
  • Property-based fence output detection

Factory Integration (langextract/factory.py)

  • Enhanced create_model() with schema constraint support
  • Automatic schema selection based on provider capabilities
  • Full backward compatibility maintained

Provider Schemas (langextract/providers/schemas/)

  • Moved GeminiSchema to dedicated module
  • Clear separation between provider implementations

Testing

  • E2E tests for schema functionality
  • Provider-specific schema validation
  • All existing tests pass without modification

Breaking Changes

None - full backward compatibility maintained. The refactoring is entirely internal with no changes to public APIs.

@github-actions github-actions bot added the size/XL Pull request with over 1000 lines changed - too large label Aug 13, 2025
@aksg87 aksg87 force-pushed the feature/schema-refactor-provider-plugins branch from e13873e to 4413070 Compare August 13, 2025 07:22
Enable providers to define custom schema implementations via BaseSchema abstraction.
Add property-based fence output, FormatModeSchema for JSON/YAML providers, and
move GeminiSchema to providers/schemas/.
@aksg87 aksg87 force-pushed the feature/schema-refactor-provider-plugins branch from 4413070 to 1003ece Compare August 13, 2025 07:32
@aksg87 aksg87 self-assigned this Aug 13, 2025
@aksg87 aksg87 merged commit 77b7b95 into main Aug 13, 2025
12 of 14 checks passed
@aksg87 aksg87 deleted the feature/schema-refactor-provider-plugins branch August 13, 2025 07:39
aksg87 added a commit that referenced this pull request Aug 21, 2025
Enable providers to define custom schema implementations via BaseSchema abstraction.
Add property-based fence output, FormatModeSchema for JSON/YAML providers, and
move GeminiSchema to providers/schemas/.
sinnaj pushed a commit to sinnaj/langextract that referenced this pull request Sep 3, 2025
Enable providers to define custom schema implementations via BaseSchema abstraction.
Add property-based fence output, FormatModeSchema for JSON/YAML providers, and
move GeminiSchema to providers/schemas/.
aksg87 added a commit that referenced this pull request Sep 12, 2025
Enable providers to define custom schema implementations via BaseSchema abstraction.
Add property-based fence output, FormatModeSchema for JSON/YAML providers, and
move GeminiSchema to providers/schemas/.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Pull request with over 1000 lines changed - too large

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant