-
Notifications
You must be signed in to change notification settings - Fork 1
ci: Add readability assessment to promptfoo GHA workflow #313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
☂️ Python Coverage
Overall Coverage
New FilesNo new covered files... Modified FilesNo covered modified files...
|
Promptfoo Evaluation Results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new readability assessment script using TextDescriptives and spacy to the promptfoo GitHub Actions workflow, ensuring chatbot responses are evaluated for readability.
- Adds app/promptfoo/readability_assessment.py for assessing text readability based on multiple metrics.
- Updates the promptfoo-googlesheet-evaluation workflow to install necessary dependencies and copy the new script for evaluation.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
File | Description |
---|---|
app/promptfoo/readability_assessment.py | New script for calculating readability metrics. |
.github/workflows/promptfoo-googlesheet-evaluation.yml | Workflow updated to trigger the readability script and install dependencies. |
Comments suppressed due to low confidence (1)
app/promptfoo/readability_assessment.py:5
- [nitpick] The function name 'get_assert' is ambiguous given its purpose in assessing readability. Consider renaming it to 'assess_readability' to improve clarity.
def get_assert(output: str, context) -> Union[bool, float, Dict[str, Any]]:
Promptfoo Evaluation Results
|
Promptfoo Evaluation Results
|
Ticket
https://navalabs.atlassian.net/browse/DST-936
Changes
Context for reviewers
Get necessary dependencies and file setup for running readability assessments in the promptfoo evaluation workflow. The readability assessment script uses textdescriptives and spacy to analyze chatbot responses when we use the "readability" rubric value. It uses metrics like Flesch-Kincaid grade level and Flesch reading ease scores.
Sidenote:
Testing
Tested by running the promptfooconfig file locally and verifying correct values:
Testing in Github Actions:
In order to run our readability metric on a targeted question, we need to set the "__expected1" column in the google sheet to the path of the python file. This then allows the test be accessed.
See an example of a successful output here.
Preview environment for app
♻️ Environment destroyed ♻️