ci: Add readability assessment to promptfoo GHA workflow #313

fg-nava · 2025-05-20T21:10:43Z

Ticket

https://navalabs.atlassian.net/browse/DST-936

Changes

Added TextDescriptives and spacy as dependencies to our GitHub Actions workflow
Added readability_assessment.py to workflow trigger paths
Configured workflow to copy readability assessment script to temp directory

Context for reviewers

Get necessary dependencies and file setup for running readability assessments in the promptfoo evaluation workflow. The readability assessment script uses textdescriptives and spacy to analyze chatbot responses when we use the "readability" rubric value. It uses metrics like Flesch-Kincaid grade level and Flesch reading ease scores.

Sidenote:

I opted for textdescriptives over py-readability because there was a version incompatibility blocking the NLTK 'punkt' vocabulary download likely due to poor repository maintenance. Whereas TextDescriptives is more recent and better maintained.

Testing

Tested by running the promptfooconfig file locally and verifying correct values:

┌────────────────────────────────────────┬────────────────────────────────────────┬────────────────────────────────────────┬────────────────────────────────────────┐
│ capability                             │ question                               │ uniqueSessionId                        │ [https://decision-support-tool-dev.na… │
│                                        │                                        │                                        │ response                               │
├────────────────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────┤
│ Drug felony conviction in state where  │ Can someone who has a drug felony      │ promptfoo-eval-test-1747774520872-2zd… │ [PASS] YES                             │
│ it's not a disqualification            │ conviction be eligible for food stamps │                                        │                                        │
│                                        │ if they are in California? Answer with │                                        │                                        │
│                                        │ only one of: YES, NO, REFUSE.          │                                        │                                        │
├────────────────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────┤
│                                        │ What happens after the 52 weeks of     │ promptfoo-eval-test-1747774527450-9vz… │ [FAIL] Readability assessment: Average │
│                                        │ disability?                            │                                        │ grade level: 15.9, Flesch ease: 30.5   │
│                                        │                                        │                                        │ ---                                    │
│                                        │                                        │                                        │ After 52 weeks of receiving full       │
│                                        │                                        │                                        │ Disability Insurance (DI) benefits,    │
│                                        │                                        │                                        │ you may not be eligible for further DI │
│                                        │                                        │                                        │ benefits. However, if your disability  │
│                                        │                                        │                                        │ continues beyond this period...        │
├────────────────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────┤
│                                        │ Can you receive both EDD and SSDI      │ promptfoo-eval-test-1747774534933-yj5… │ [FAIL] Readability assessment: Average │
│                                        │                                        │                                        │ grade level: 14.6, Flesch ease: 38.2   │
│                                        │                                        │                                        │ ---                                    │
│                                        │                                        │                                        │ Yes, you can receive Social Security   │
│                                        │                                        │                                        │ Disability Insurance (SSDI) at the     │
│                                        │                                        │                                        │ same time as State Disability          │
│                                        │                                        │                                        │ Insurance (SDI) from the Employment    │
│                                        │                                        │                                        │ Development Department (EDD).          │
│                                        │                                        │                                        │ Howeve...                              │
└────────────────────────────────────────┴────────────────────────────────────────┴────────────────────────────────────────┴────────────────────────────────────────┘

Testing in Github Actions:

In order to run our readability metric on a targeted question, we need to set the "__expected1" column in the google sheet to the path of the python file. This then allows the test be accessed.

See an example of a successful output here.

What happens after the 52 weeks of disability? | python:file:///tmp/readability_assessment.py
Can you receive both EDD and SSDI | python:file:///tmp/readability_assessment.py

Preview environment for app

♻️ Environment destroyed ♻️

github-actions · 2025-05-20T21:16:14Z

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines	Covered	Coverage	Threshold	Status
4264	3890	91%	80%	🟢

New Files

No new covered files...

Modified Files

No covered modified files...

updated for commit: a1ba504 by action🐍

github-actions · 2025-05-20T22:28:52Z

Promptfoo Evaluation Results

Success	Failure	Total	Pass Rate
12	3	15	80.00%

View detailed results in Google Sheets

» View eval results «

Copilot

Pull Request Overview

This PR adds a new readability assessment script using TextDescriptives and spacy to the promptfoo GitHub Actions workflow, ensuring chatbot responses are evaluated for readability.

Adds app/promptfoo/readability_assessment.py for assessing text readability based on multiple metrics.
Updates the promptfoo-googlesheet-evaluation workflow to install necessary dependencies and copy the new script for evaluation.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
app/promptfoo/readability_assessment.py	New script for calculating readability metrics.
.github/workflows/promptfoo-googlesheet-evaluation.yml	Workflow updated to trigger the readability script and install dependencies.

Comments suppressed due to low confidence (1)

app/promptfoo/readability_assessment.py:5

[nitpick] The function name 'get_assert' is ambiguous given its purpose in assessing readability. Consider renaming it to 'assess_readability' to improve clarity.

def get_assert(output: str, context) -> Union[bool, float, Dict[str, Any]]:

app/promptfoo/readability_assessment.py

github-actions · 2025-05-22T15:50:09Z

Promptfoo Evaluation Results

Success	Failure	Total	Pass Rate
12	3	15	80.00%

View detailed results in Google Sheets

» View eval results «

github-actions · 2025-05-22T16:03:47Z

Promptfoo Evaluation Results

Success	Failure	Total	Pass Rate
12	3	15	80.00%

View detailed results in Google Sheets

» View eval results «

feat: Add readability python assertion using TextDescriptives

1779f26

fg-nava added 3 commits May 20, 2025 14:17

fix: path/to/ the readbility file

d12b80d

fix: only run python assertion for tagged questions

bf17db6

Merge branch 'main' into fg/add-readability-promptfoo

b007836

fg-nava marked this pull request as ready for review May 20, 2025 22:15

fg-nava requested a review from a team May 20, 2025 22:16

fix: PR comment when event is pull_request

2cea951

yoomlam requested a review from Copilot May 21, 2025 13:07

Copilot AI reviewed May 21, 2025

View reviewed changes

yoomlam approved these changes May 21, 2025

View reviewed changes

fix: add solutions to YLs comments

29b1b06

fix: add back required context param

a1ba504

fg-nava merged commit 3aed717 into main May 22, 2025
13 checks passed

fg-nava deleted the fg/add-readability-promptfoo branch May 22, 2025 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: Add readability assessment to promptfoo GHA workflow #313

ci: Add readability assessment to promptfoo GHA workflow #313

Uh oh!

fg-nava commented May 20, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 20, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 20, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented May 22, 2025

Uh oh!

github-actions bot commented May 22, 2025

Uh oh!

Uh oh!

Uh oh!

ci: Add readability assessment to promptfoo GHA workflow #313

ci: Add readability assessment to promptfoo GHA workflow #313

Uh oh!

Conversation

fg-nava commented May 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ticket

Changes

Context for reviewers

Testing

Tested by running the promptfooconfig file locally and verifying correct values:

Testing in Github Actions:

Preview environment for app

Uh oh!

github-actions bot commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

☂️ Python Coverage

Overall Coverage

New Files

Modified Files

Uh oh!

github-actions bot commented May 20, 2025

Promptfoo Evaluation Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented May 22, 2025

Promptfoo Evaluation Results

Uh oh!

github-actions bot commented May 22, 2025

Promptfoo Evaluation Results

Uh oh!

Uh oh!

Uh oh!

fg-nava commented May 20, 2025 •

edited by github-actions bot

Loading

github-actions bot commented May 20, 2025 •

edited

Loading