Skip to content

Conversation

@maziyarpanahi
Copy link
Owner

Pull Request

Description

Brief description of what this PR does.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Code refactoring
  • Performance improvement
  • Test addition/improvement

Changes Made

  • Change 1
  • Change 2
  • Change 3

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this change with different models/inputs

Documentation

  • I have updated the documentation accordingly
  • I have added docstrings to new functions/classes
  • I have updated the CHANGELOG.md

Code Quality

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings

Dependencies

  • I have not added any new dependencies
  • OR I have added new dependencies and they are justified because: ____

Checklist

  • I have read the contributing guidelines
  • My commits have clear, descriptive messages
  • I have squashed/organized my commits appropriately

Related Issues

Closes #(issue number)
Related to #(issue number)

Screenshots/Examples

If applicable, add screenshots or example outputs to help explain your changes.

…int to ensure compatibility and enhance text processing capabilities in the project.
…pendencies: Enhance the clarity of the requirements section, specify the use of `uv` for installation, and detail the installation process for Hugging Face support and PyTorch, ensuring users have a better understanding of the setup process.
…ntroduce optional parameters for sentence detection, including language and cleaning heuristics, and refactor the function to handle segmented input more effectively. Update CLI to lazily load analyze_text and related functions for improved performance.
…ion handling to catch both ImportError and OSError, improving robustness in model loading. Additionally, include sentences module in processing exports for better accessibility.
… entity predictions by introducing a metadata field, allowing for additional contextual information. Update grouping logic to consider sentence indices and improve JSON output formatting to include metadata attributes, ensuring richer data representation.
…nce segmentation using pySBD, including the SentenceSpan class for representing sentences and their character boundaries. Implement caching for segmenter instances and fallback logic for span generation when character offsets are unavailable.
…e exception handling to catch both ImportError and OSError, improving robustness in tokenizer initialization.
…te text files, 'clinical_note.txt' and 'long_clinical_note.txt', to enhance test coverage for clinical documentation scenarios, ensuring comprehensive validation of processing functionalities.
…ion tests in `test_sentence_detection_real.py` to validate sentence detection functionality with real models, ensuring consistent behavior and proper handling of placeholder-only segments.
…ure `analyze_text`, `get_model_max_length`, and `list_models` are lazily imported, improving performance and allowing for easier testing by exposing these functions for patching without eager imports.
@maziyarpanahi maziyarpanahi merged commit fcc8149 into master Oct 29, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants