Skip to content

Conversation

lazzyms
Copy link

@lazzyms lazzyms commented Aug 11, 2025

Fixes: #110
Problem
The Ollama example code in README.md was missing the required language_model_type parameter, causing users to encounter the error: "API key must be provided for cloud-hosted models via the api_key parameter or the LANGEXTRACT_API_KEY environment variable" when trying to run the example.
Solution
Added the missing language_model_type=inference.OllamaLanguageModel parameter to the Ollama integration example in README.md.
Changes

File modified: README.md
Change: Added language_model_type=inference.OllamaLanguageModel parameter to Ollama example code
Impact: Users can now run the Ollama example without API key errors, as the parameter correctly identifies it as a local model

Testing

Verified the updated example code is valid Python syntax
Confirmed parameter formatting is consistent with other examples in the README
Validated the example now runs without the previous API key error

Type of Change

Bug fix (non-breaking change which fixes an issue)
Documentation update
New feature
Breaking change

This documentation fix ensures users can successfully follow the Ollama integration guide by including the required parameter that distinguishes local models from cloud-hosted ones.

aksg87 and others added 30 commits July 22, 2025 01:39
- Switch from badge.fury.io to shields.io for working PyPI badge
- Convert relative paths to absolute GitHub URLs for PyPI compatibility
- Bump version to 0.1.3
- Add GitHub Actions workflow for automated PyPI publishing via OIDC
- Configure trusted publishing environment for verified releases
- Update project metadata with proper URLs and license format
- Prepare for v1.0.0 stable release with production-ready automation
- Add pylibmagic>=0.5.0 dependency for bundled libraries
- Add [full] install option and pre-import handling
- Update README with troubleshooting and Docker sections
- Bump version to 1.0.1

Fixes google#6
Deleted an inline comment referencing the  output directory in the save_annotated_documents.
…ples.md

docs: clarify output_dir behavior in medication_examples.md
Prevents confusion from default `test_output/...` by explicitly saving to current directory.
docs: add output_dir="." to all save_annotated_documents examples
feat: add code formatting and linting pipeline
Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause.
Add LangExtractError base exception for centralized error handling
Fixes google#25 - Windows installation failure due to pylibmagic build requirements

Breaking change: LangFunLanguageModel removed. Use GeminiLanguageModel or OllamaLanguageModel instead.
fix: Remove LangFun and pylibmagic dependencies to fix Windows installation and OpenAI SDK v1.x compatibility
- Modified save_annotated_documents to accept both pathlib.Path and string paths
- Convert string paths to Path objects before calling mkdir()
- This fixes the error when using output_dir='.' as shown in the README example
…-mkdir

Fix save_annotated_documents to handle string paths
feat: Add OpenAI language model support
…s: (google#10)

* docs: clarify output_dir behavior in medication_examples.md

* Removed inline comment in medication example

Deleted an inline comment referencing the  output directory in the save_annotated_documents.

* docs: add output_dir="." to all save_annotated_documents examples

Prevents confusion from default `test_output/...` by explicitly saving to current directory.

* build: add formatting & linting pipeline with pre-commit integration

* style: apply pyink, isort, and pre-commit formatting

* ci: enable format and lint checks in tox

* Add LangExtractError base exception for centralized error handling

Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause.

* fix(ui): prevent current highlight border from being obscured

---------

Co-authored-by: Leena Kamran <62442533+kleeena@users.noreply.github.com>
Co-authored-by: Akshay Goel <akshay.k.goel@gmail.com>
- Gemini & OpenAI test suites with retry on transient errors
- CI: Separate job, Python 3.11 only, skips for forks
- Validates char_interval for all extractions
- Multilingual test xfail (issue google#13)

TODO: Remove xfail from multilingual test after tokenizer fix
…e#62)

- Add quickstart example and documentation for local LLM usage
- Include Docker setup with health checks and docker-compose
- Add integration tests and update CI pipeline
- Secure setup: localhost-only binding, containerized deployment

Signed-off-by: Akshay Goel <goelak@google.com>
- Ollama integration with Docker examples
- Fixed OllamaLanguageModel parameter name (model -> model_id)
- Added CI/CD tests for Ollama
- Updated documentation with consistent API examples
Bumps the github_actions group with 1 update in the /.github/workflows directory: [tj-actions/changed-files](https://github.com/tj-actions/changed-files).


Updates `tj-actions/changed-files` from 44 to 46
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](tj-actions/changed-files@v44...v46)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-version: '46'
  dependency-type: direct:production
  dependency-group: github_actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…e#74)

- Add check-linked-issue.yml: Enforces that PRs reference issues with 5+ community reactions
- Add check-pr-size.yml: Labels PRs by size and enforces 1000 line limit
- Update CONTRIBUTING.md: Document new PR requirements and size guidelines
- Include helpful error messages with links to contribution guidelines
- Create a scalable system for maintaining code quality and review efficiency
aksg87 and others added 22 commits August 6, 2025 09:38
Enables two ways to run live API tests:
1. workflow_dispatch: Manual trigger via Actions tab
2. Label trigger: Add 'ready-to-merge' label to any PR

The label-based approach uses pull_request_target for security:
- Runs in base repository context with access to secrets
- Safely merges PR into main branch before testing
- Only maintainers can trigger
- Comments test results back to PR

This provides a production-ready solution for testing PRs from forks
while maintaining security, following patterns used by major projects.
* Add base_url to OpenAILanguageModel

* Github action lint is outdated, so adapting

* Adding base_url to parameterized test

* Lint fixes to inference_test.py
Bug: Workflows triggered on pull_request_target but checked for pull_request,
causing all validations to be skipped.

Fixed:
- Event condition checks now match trigger type
- Add manual revalidation workflow
- Enable workflow_dispatch with PR number input
- Creates visible PR checks (pass/fail status)
- Shows validation errors in status description (up to 140 chars)
- Links to workflow run for full details
- Maintains backward compatibility with comment reporting
The workflow was comparing boolean true to string 'true', causing all validations to incorrectly show as failed even when all checks passed.
- revalidate-all-prs.sh: Triggers manual validation for all open PRs
- add-size-labels.sh: Adds size labels (XS/S/M/L/XL) based on change count
- add-new-checks.sh: Adds required status checks to branch protection

These scripts require maintainer permissions and help manage PR workflows.
- Add type ignore comments for IPython imports
- Fix return type annotation (remove unnecessary quotes)
- Add _is_jupyter() to properly detect notebook environments
- Replace lambda with def function for pylint compliance

Fixes google#65
- Add format-check job that checks actual PR code, not merge commit
- Validate formatting before expensive fork PR tests
- Provide clear error messages when formatting fails

Fixes false positives where incorrectly formatted PRs passed CI
Auto-updates PRs behind main, handles forks/conflicts gracefully,
skips bot/draft PRs, monitors API limits
- Apply end-of-file and whitespace fixes to workflows
- Fix empty interval bug when newline falls at chunk boundary (issue google#71)
- Add concise comment explaining the fix logic
- Remove excessive/obvious comments from chunking tests
- Improve test docstring to be more descriptive and professional
The exceptions.py file existed in both the root directory and langextract/ directory with identical content. This removes the duplicate from the root to avoid confusion and maintain proper package structure.
…le (google#97)

Introduces a provider registry system enabling third-party providers to be dynamically registered and discovered through a plugin architecture. Users can now integrate custom LLM backends (Azure OpenAI, AWS Bedrock, custom inference servers) without modifying core LangExtract code.

Fixes google#80, google#67, google#54, google#49, google#48, google#53

Key Changes:

**Provider Registry** (`langextract/providers/registry.py`)
- Pattern-based registration with priority resolution
- Automatic discovery via Python entry points
- Lazy loading for performance

**Factory Enhancements** (`langextract/factory.py`)
- `ModelConfig` dataclass for structured configuration
- Explicit provider selection when patterns overlap
- Full backward compatibility maintained

**Plugin Example** (`examples/custom_provider_plugin/`)
- Complete working example with entry point configuration
- Shows how to create custom providers for any backend

**Documentation**
- Comprehensive provider system README with architecture diagrams
- Step-by-step plugin creation guide

**Dependencies**
- Move openai to optional dependencies
- Update tox.ini to include openai in test environments

**Lint Fixes**
- Add appropriate pylint suppressions for legitimate patterns
- Fix unused variable warnings in tests
- Address import and global statement warnings

No anticipated breakage - full backward compatibility maintained. Given significant internal changes to provider loading, issues should be reported if unexpected behavior is encountered.
Add common development files, tools, and temporary file patterns
- Show current approach using factory.create_model()
- Add note that direct model passing to extract() is coming soon
- Keep planned API as commented code for reference
Ensure providers are loaded before pattern matching to prevent API key
errors when using local models. Optimize to skip loading when provider
is explicitly specified.
- Add proper permissions (issues: write for comments)
- Skip draft PRs to avoid noise
- Prevent duplicate comments with hidden marker
- Search both title and body for issue links
- Support all keyword variants and cross-repo references
- Count unique users for reactions, not total count
- Include 'write' permission for maintainer override
- Add concurrency control for rapid edits
- Handle cross-repo issues gracefully
- 6 tests: plugin discovery, loading, idempotency, error handling
- Smart CI triggers for integration test on provider changes
- New tox environments: plugin-smoke and plugin-integration
@github-actions github-actions bot added the size/XS Pull request with less than 50 lines changed label Aug 11, 2025
@lazzyms lazzyms marked this pull request as ready for review August 11, 2025 12:24
Copy link

⚠️ Branch Update Required

Your branch is 1 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

Copy link

⚠️ Branch Update Required

Your branch is 26 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

@aksg87
Copy link
Collaborator

aksg87 commented Aug 21, 2025

Thanks for the contribution and PR. The language_model_type is going to be deprecated which is noted in recent docstring updates. Using the model config is going to the recommended way of interacting with models. Closing this PR for now, but please discuss the issue again if it comes up. There should be an update that handles passing all parameters to model kwargs more gracefully soon.

@aksg87 aksg87 closed this Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XS Pull request with less than 50 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] langextract still asking for API key on local Ollama API setup

7 participants