Skip to content

ci: Add readability assessment to promptfoo GHA workflow #313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 22, 2025

Conversation

fg-nava
Copy link
Contributor

@fg-nava fg-nava commented May 20, 2025

Ticket

https://navalabs.atlassian.net/browse/DST-936

Changes

  • Added TextDescriptives and spacy as dependencies to our GitHub Actions workflow
  • Added readability_assessment.py to workflow trigger paths
  • Configured workflow to copy readability assessment script to temp directory

Context for reviewers

Get necessary dependencies and file setup for running readability assessments in the promptfoo evaluation workflow. The readability assessment script uses textdescriptives and spacy to analyze chatbot responses when we use the "readability" rubric value. It uses metrics like Flesch-Kincaid grade level and Flesch reading ease scores.

Sidenote:

I opted for textdescriptives over py-readability because there was a version incompatibility blocking the NLTK 'punkt' vocabulary download likely due to poor repository maintenance. Whereas TextDescriptives is more recent and better maintained.

Testing

Tested by running the promptfooconfig file locally and verifying correct values:

┌────────────────────────────────────────┬────────────────────────────────────────┬────────────────────────────────────────┬────────────────────────────────────────┐
│ capability                             │ question                               │ uniqueSessionId                        │ [https://decision-support-tool-dev.na… │
│                                        │                                        │                                        │ response                               │
├────────────────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────┤
│ Drug felony conviction in state where  │ Can someone who has a drug felony      │ promptfoo-eval-test-1747774520872-2zd… │ [PASS] YES                             │
│ it's not a disqualification            │ conviction be eligible for food stamps │                                        │                                        │
│                                        │ if they are in California? Answer with │                                        │                                        │
│                                        │ only one of: YES, NO, REFUSE.          │                                        │                                        │
├────────────────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────┤
│                                        │ What happens after the 52 weeks of     │ promptfoo-eval-test-1747774527450-9vz… │ [FAIL] Readability assessment: Average │
│                                        │ disability?                            │                                        │ grade level: 15.9, Flesch ease: 30.5   │
│                                        │                                        │                                        │ ---                                    │
│                                        │                                        │                                        │ After 52 weeks of receiving full       │
│                                        │                                        │                                        │ Disability Insurance (DI) benefits,    │
│                                        │                                        │                                        │ you may not be eligible for further DI │
│                                        │                                        │                                        │ benefits. However, if your disability  │
│                                        │                                        │                                        │ continues beyond this period...        │
├────────────────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────┤
│                                        │ Can you receive both EDD and SSDI      │ promptfoo-eval-test-1747774534933-yj5… │ [FAIL] Readability assessment: Average │
│                                        │                                        │                                        │ grade level: 14.6, Flesch ease: 38.2   │
│                                        │                                        │                                        │ ---                                    │
│                                        │                                        │                                        │ Yes, you can receive Social Security   │
│                                        │                                        │                                        │ Disability Insurance (SSDI) at the     │
│                                        │                                        │                                        │ same time as State Disability          │
│                                        │                                        │                                        │ Insurance (SDI) from the Employment    │
│                                        │                                        │                                        │ Development Department (EDD).          │
│                                        │                                        │                                        │ Howeve...                              │
└────────────────────────────────────────┴────────────────────────────────────────┴────────────────────────────────────────┴────────────────────────────────────────┘

Testing in Github Actions:

In order to run our readability metric on a targeted question, we need to set the "__expected1" column in the google sheet to the path of the python file. This then allows the test be accessed.

See an example of a successful output here.

What happens after the 52 weeks of disability? | python:file:///tmp/readability_assessment.py
Can you receive both EDD and SSDI | python:file:///tmp/readability_assessment.py

Preview environment for app

♻️ Environment destroyed ♻️

Copy link

github-actions bot commented May 20, 2025

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
4264 3890 91% 80% 🟢

New Files

No new covered files...

Modified Files

No covered modified files...

updated for commit: a1ba504 by action🐍

@fg-nava fg-nava marked this pull request as ready for review May 20, 2025 22:15
@fg-nava fg-nava requested a review from a team May 20, 2025 22:16
Copy link

Promptfoo Evaluation Results

Success Failure Total Pass Rate
12 3 15 80.00%

View detailed results in Google Sheets

» View eval results «

@yoomlam yoomlam requested a review from Copilot May 21, 2025 13:07
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new readability assessment script using TextDescriptives and spacy to the promptfoo GitHub Actions workflow, ensuring chatbot responses are evaluated for readability.

  • Adds app/promptfoo/readability_assessment.py for assessing text readability based on multiple metrics.
  • Updates the promptfoo-googlesheet-evaluation workflow to install necessary dependencies and copy the new script for evaluation.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
app/promptfoo/readability_assessment.py New script for calculating readability metrics.
.github/workflows/promptfoo-googlesheet-evaluation.yml Workflow updated to trigger the readability script and install dependencies.
Comments suppressed due to low confidence (1)

app/promptfoo/readability_assessment.py:5

  • [nitpick] The function name 'get_assert' is ambiguous given its purpose in assessing readability. Consider renaming it to 'assess_readability' to improve clarity.
def get_assert(output: str, context) -> Union[bool, float, Dict[str, Any]]:

Copy link

Promptfoo Evaluation Results

Success Failure Total Pass Rate
12 3 15 80.00%

View detailed results in Google Sheets

» View eval results «

Copy link

Promptfoo Evaluation Results

Success Failure Total Pass Rate
12 3 15 80.00%

View detailed results in Google Sheets

» View eval results «

@fg-nava fg-nava merged commit 3aed717 into main May 22, 2025
13 checks passed
@fg-nava fg-nava deleted the fg/add-readability-promptfoo branch May 22, 2025 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants