-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix #110 formatting in README for consistency and clarity #118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Switch from badge.fury.io to shields.io for working PyPI badge - Convert relative paths to absolute GitHub URLs for PyPI compatibility - Bump version to 0.1.3
- Add GitHub Actions workflow for automated PyPI publishing via OIDC - Configure trusted publishing environment for verified releases - Update project metadata with proper URLs and license format - Prepare for v1.0.0 stable release with production-ready automation
- Add pylibmagic>=0.5.0 dependency for bundled libraries - Add [full] install option and pre-import handling - Update README with troubleshooting and Docker sections - Bump version to 1.0.1 Fixes google#6
Deleted an inline comment referencing the output directory in the save_annotated_documents.
…ples.md docs: clarify output_dir behavior in medication_examples.md
Prevents confusion from default `test_output/...` by explicitly saving to current directory.
docs: add output_dir="." to all save_annotated_documents examples
feat: add code formatting and linting pipeline
Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause.
Add LangExtractError base exception for centralized error handling
Fixes google#25 - Windows installation failure due to pylibmagic build requirements Breaking change: LangFunLanguageModel removed. Use GeminiLanguageModel or OllamaLanguageModel instead.
fix: Remove LangFun and pylibmagic dependencies to fix Windows installation and OpenAI SDK v1.x compatibility
- Modified save_annotated_documents to accept both pathlib.Path and string paths - Convert string paths to Path objects before calling mkdir() - This fixes the error when using output_dir='.' as shown in the README example
…-mkdir Fix save_annotated_documents to handle string paths
feat: Add OpenAI language model support
…s: (google#10) * docs: clarify output_dir behavior in medication_examples.md * Removed inline comment in medication example Deleted an inline comment referencing the output directory in the save_annotated_documents. * docs: add output_dir="." to all save_annotated_documents examples Prevents confusion from default `test_output/...` by explicitly saving to current directory. * build: add formatting & linting pipeline with pre-commit integration * style: apply pyink, isort, and pre-commit formatting * ci: enable format and lint checks in tox * Add LangExtractError base exception for centralized error handling Introduces a common base exception class that all library-specific exceptions inherit from, enabling users to catch all LangExtract errors with a single except clause. * fix(ui): prevent current highlight border from being obscured --------- Co-authored-by: Leena Kamran <62442533+kleeena@users.noreply.github.com> Co-authored-by: Akshay Goel <akshay.k.goel@gmail.com>
- Gemini & OpenAI test suites with retry on transient errors - CI: Separate job, Python 3.11 only, skips for forks - Validates char_interval for all extractions - Multilingual test xfail (issue google#13) TODO: Remove xfail from multilingual test after tokenizer fix
…e#62) - Add quickstart example and documentation for local LLM usage - Include Docker setup with health checks and docker-compose - Add integration tests and update CI pipeline - Secure setup: localhost-only binding, containerized deployment Signed-off-by: Akshay Goel <goelak@google.com>
- Ollama integration with Docker examples - Fixed OllamaLanguageModel parameter name (model -> model_id) - Added CI/CD tests for Ollama - Updated documentation with consistent API examples
Bumps the github_actions group with 1 update in the /.github/workflows directory: [tj-actions/changed-files](https://github.com/tj-actions/changed-files). Updates `tj-actions/changed-files` from 44 to 46 - [Release notes](https://github.com/tj-actions/changed-files/releases) - [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md) - [Commits](tj-actions/changed-files@v44...v46) --- updated-dependencies: - dependency-name: tj-actions/changed-files dependency-version: '46' dependency-type: direct:production dependency-group: github_actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…e#74) - Add check-linked-issue.yml: Enforces that PRs reference issues with 5+ community reactions - Add check-pr-size.yml: Labels PRs by size and enforces 1000 line limit - Update CONTRIBUTING.md: Document new PR requirements and size guidelines - Include helpful error messages with links to contribution guidelines - Create a scalable system for maintaining code quality and review efficiency
Enables two ways to run live API tests: 1. workflow_dispatch: Manual trigger via Actions tab 2. Label trigger: Add 'ready-to-merge' label to any PR The label-based approach uses pull_request_target for security: - Runs in base repository context with access to secrets - Safely merges PR into main branch before testing - Only maintainers can trigger - Comments test results back to PR This provides a production-ready solution for testing PRs from forks while maintaining security, following patterns used by major projects.
* Add base_url to OpenAILanguageModel * Github action lint is outdated, so adapting * Adding base_url to parameterized test * Lint fixes to inference_test.py
Bug: Workflows triggered on pull_request_target but checked for pull_request, causing all validations to be skipped. Fixed: - Event condition checks now match trigger type - Add manual revalidation workflow - Enable workflow_dispatch with PR number input
- Creates visible PR checks (pass/fail status) - Shows validation errors in status description (up to 140 chars) - Links to workflow run for full details - Maintains backward compatibility with comment reporting
The workflow was comparing boolean true to string 'true', causing all validations to incorrectly show as failed even when all checks passed.
- revalidate-all-prs.sh: Triggers manual validation for all open PRs - add-size-labels.sh: Adds size labels (XS/S/M/L/XL) based on change count - add-new-checks.sh: Adds required status checks to branch protection These scripts require maintainer permissions and help manage PR workflows.
- Add type ignore comments for IPython imports - Fix return type annotation (remove unnecessary quotes) - Add _is_jupyter() to properly detect notebook environments - Replace lambda with def function for pylint compliance Fixes google#65
- Add format-check job that checks actual PR code, not merge commit - Validate formatting before expensive fork PR tests - Provide clear error messages when formatting fails Fixes false positives where incorrectly formatted PRs passed CI
Auto-updates PRs behind main, handles forks/conflicts gracefully, skips bot/draft PRs, monitors API limits
- Apply end-of-file and whitespace fixes to workflows
- Fix empty interval bug when newline falls at chunk boundary (issue google#71) - Add concise comment explaining the fix logic - Remove excessive/obvious comments from chunking tests - Improve test docstring to be more descriptive and professional
The exceptions.py file existed in both the root directory and langextract/ directory with identical content. This removes the duplicate from the root to avoid confusion and maintain proper package structure.
…le (google#97) Introduces a provider registry system enabling third-party providers to be dynamically registered and discovered through a plugin architecture. Users can now integrate custom LLM backends (Azure OpenAI, AWS Bedrock, custom inference servers) without modifying core LangExtract code. Fixes google#80, google#67, google#54, google#49, google#48, google#53 Key Changes: **Provider Registry** (`langextract/providers/registry.py`) - Pattern-based registration with priority resolution - Automatic discovery via Python entry points - Lazy loading for performance **Factory Enhancements** (`langextract/factory.py`) - `ModelConfig` dataclass for structured configuration - Explicit provider selection when patterns overlap - Full backward compatibility maintained **Plugin Example** (`examples/custom_provider_plugin/`) - Complete working example with entry point configuration - Shows how to create custom providers for any backend **Documentation** - Comprehensive provider system README with architecture diagrams - Step-by-step plugin creation guide **Dependencies** - Move openai to optional dependencies - Update tox.ini to include openai in test environments **Lint Fixes** - Add appropriate pylint suppressions for legitimate patterns - Fix unused variable warnings in tests - Address import and global statement warnings No anticipated breakage - full backward compatibility maintained. Given significant internal changes to provider loading, issues should be reported if unexpected behavior is encountered.
Add common development files, tools, and temporary file patterns
- Show current approach using factory.create_model() - Add note that direct model passing to extract() is coming soon - Keep planned API as commented code for reference
Ensure providers are loaded before pattern matching to prevent API key errors when using local models. Optimize to skip loading when provider is explicitly specified.
- Add proper permissions (issues: write for comments) - Skip draft PRs to avoid noise - Prevent duplicate comments with hidden marker - Search both title and body for issue links - Support all keyword variants and cross-repo references - Count unique users for reactions, not total count - Include 'write' permission for maintainer override - Add concurrency control for rapid edits - Handle cross-repo issues gracefully
- 6 tests: plugin discovery, loading, idempotency, error handling - Smart CI triggers for integration test on provider changes - New tox environments: plugin-smoke and plugin-integration
Your branch is 1 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Your branch is 26 commits behind git fetch origin main
git merge origin/main
git push Note: Enable "Allow edits by maintainers" to allow automatic updates. |
Thanks for the contribution and PR. The |
Fixes: #110
Problem
The Ollama example code in README.md was missing the required language_model_type parameter, causing users to encounter the error: "API key must be provided for cloud-hosted models via the api_key parameter or the LANGEXTRACT_API_KEY environment variable" when trying to run the example.
Solution
Added the missing language_model_type=inference.OllamaLanguageModel parameter to the Ollama integration example in README.md.
Changes
File modified: README.md
Change: Added language_model_type=inference.OllamaLanguageModel parameter to Ollama example code
Impact: Users can now run the Ollama example without API key errors, as the parameter correctly identifies it as a local model
Testing
Verified the updated example code is valid Python syntax
Confirmed parameter formatting is consistent with other examples in the README
Validated the example now runs without the previous API key error
Type of Change
Bug fix (non-breaking change which fixes an issue)
Documentation update
New feature
Breaking change
This documentation fix ensures users can successfully follow the Ollama integration guide by including the required parameter that distinguishes local models from cloud-hosted ones.