Skip to content

v0.5.0 Beta

Choose a tag to compare

@LSeu-Open LSeu-Open released this 14 Jun 16:44
· 34 commits to main since this release
b161102

LLMScoreEngine v0.5.0 - New Features, Major Refactoring and Testing Overhaul

This update represents a major refactoring of the LLMScoreEngine, moving from a system with hardcoded values to a centralized, configuration-driven architecture. This is complemented by a complete, from-scratch pytest testing framework to ensure the reliability and correctness of the application.

New Features

  • Batch Scoring (--all): Score all models in the Models/ directory with a single command.
  • Dynamic Configuration (--config): Provide an external Python configuration file to experiment with scoring parameters without modifying the core code.
  • CSV Report Generation (--csv): Generate a consolidated CSV report of all model scores, saved in the Results/ directory.
  • Quiet Output Mode (--quiet): Suppress all informational output to show only the final model scores, ideal for automated scripts.

Refactoring & Bug Fixes

  • Architecture:

    • Refactored the application to use a centralized, immutable configuration file (config/scoring_config.py) instead of hardcoded values.
    • Improved import paths and module structure to prevent ImportError issues and clarify the package API.
  • Scoring Logic:

    • The ModelScorer class was rewritten to source all parameters from the central configuration.
    • Corrected scoring formulas in hf_score.py to align with the project documentation.
  • Bug Fixes:

    • Fixed a critical bug that caused benchmark scores to be ignored during calculations.
    • Resolved a case-insensitivity bug in data/loaders.py that prevented model files from being found.
  • Testing:

    • Built a comprehensive pytest testing framework from scratch, with unit and end-to-end tests covering all critical modules.
    • Used mocking for external APIs to ensure tests are fast and deterministic.
    • Improved test isolation and added verbose logging for clearer debugging.