Skip to content

feat: Setup Github Action workflow file for PromptFoo #307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
May 12, 2025

Conversation

fg-nava
Copy link
Contributor

@fg-nava fg-nava commented May 9, 2025

Ticket

https://navalabs.atlassian.net/browse/DST-964

Changes

  • Added GitHub Action workflow to run promptfoo evaluations on PRs
  • Integrated with Google Sheets for test case inputs and evaluation outputs
  • Configured promptfoo config that uses dev environment endpoint
  • Added automated PR comments showing evaluation results, a shareable link to PromptFoo result, and a link to the Google Sheet
    • Note: PromptFoo shareable links are only accessible to users who have been invited to register for PromptFoo and are signed into their PromptFoo account. Please ensure you are signed in to the NavaLabs team on PromptFoo to view the evaluation results through the shareable links.

Testing

Workflow runs on PRs when changes are made to:

  • app/src/chat_api.py
  • app/src/chat_engine.py
  • app/src/generate.py
  • app/promptfooconfig.ci.yaml

Local Testing with Act

  1. Install act if not already installed:
brew install act
  1. Create a .secrets file in the project root with the following environment variables:
  • GITHUB_TOKEN (for local testing only): Classic Personal Access Token (PAT) with 'repo' scope (Settings > Developer settings > Personal access tokens > Tokens (classic))
  • GOOGLE_CREDENTIALS_JSON: Service account JSON for Google Sheets access
    • For your first time setting this value use this command: echo "GOOGLE_CREDENTIALS_JSON='$(cat /path/to/service-account.json | jq -c .)'" >> .secrets
  • OPENAI_API_KEY: OpenAI API key for LLM evaluation
  • GOOGLE_SHEET_INPUT_URL: URL to input test cases sheet
  • GOOGLE_SHEET_OUTPUT_URL: URL to output results sheet
  • PROMPTFOO_API_KEY: API key for PromptFoo cloud features
  1. Run the workflow locally:
act pull_request -W .github/workflows/promptfoo-googlesheet-evaluation.yml \
  -s GITHUB_TOKEN -s GOOGLE_CREDENTIALS_JSON -s OPENAI_API_KEY \
  --artifact-server-path /tmp/artifacts \
  --container-architecture linux/amd64 -v

GitHub Testing

Required secrets are configured in GitHub at: Settings > Secrets and variables > Actions (exclude GITHUB_TOKEN, this was only for local testing)

Example workflow run: GitHub Actions Run

Example automatic PR comment:
Screenshot 2025-05-12 at 7 12 39 AM

Preview environment for app

♻️ Environment destroyed ♻️

Copy link

github-actions bot commented May 9, 2025

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
4264 3890 91% 80% 🟢

New Files

No new covered files...

Modified Files

No covered modified files...

updated for commit: 4d1908b by action🐍

Copy link

github-actions bot commented May 9, 2025

Promptfoo Evaluation Results

Success Failure Total Pass Rate
0 0 0 NaN%

View detailed results in Google Sheets

Run promptfoo view --id null locally to view interactive results

Copy link

github-actions bot commented May 9, 2025

Promptfoo Evaluation Results

Success Failure Total Pass Rate
13 0 13 100.00%

View detailed results in Google Sheets

Run promptfoo view --id null locally to view interactive results

Copy link

Promptfoo Evaluation Results

Success Failure Total Pass Rate
13 0 13 100.00%

View detailed results in Google Sheets

» View eval results «

Copy link

Promptfoo Evaluation Results

Success Failure Total Pass Rate
51 11 62 82.26%

View detailed results in Google Sheets

» View eval results in CI console «

Copy link

Promptfoo Evaluation Results

Success Failure Total Pass Rate
47 15 62 75.81%

View detailed results in Google Sheets

» View eval results «

Copy link

Promptfoo Evaluation Results

Success Failure Total Pass Rate
51 11 62 82.26%

View detailed results in Google Sheets

» View eval results «

Copy link

Promptfoo Evaluation Results

Success Failure Total Pass Rate
51 11 62 82.26%

View detailed results in Google Sheets

» View eval results «

@fg-nava fg-nava marked this pull request as ready for review May 12, 2025 14:34
@fg-nava fg-nava requested a review from a team May 12, 2025 14:34
@fg-nava fg-nava changed the title feat: Setup Github Action workflow file for PromptFoo-GoogleSheet int… feat: Setup Github Action workflow file for PromptFoo May 12, 2025
Copy link

Promptfoo Evaluation Results

Success Failure Total Pass Rate
0 0 0 NaN%

View detailed results in Google Sheets

» View eval results in CI console «

Copy link

Promptfoo Evaluation Results

Success Failure Total Pass Rate
51 11 62 82.26%

View detailed results in Google Sheets

» View eval results «

Copy link
Contributor

@KevinJBoyer KevinJBoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!


- name: Create unique ID generator
run: |
cat > /tmp/generateUniqueId.js << 'EOF'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider moving the /docs/app/generateUniqueId.js out of the docs directory and into somwhere like app/src/evaluation and then referencing it here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're not using /docs/app/generateUniqueId.js here, the script used in promptfoo.ci.config.yaml is written within the Create unique ID generator step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can still move the JS file or just add it as a snippet to the README

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that suggestion wasn't clear -- what I mean is move the file and then read from it here (or reference it directly in promptfoo.ci.config.yaml) so that you don't have to embed Javascript inside of a .yml file

Copy link
Contributor Author

@fg-nava fg-nava May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will move the script here and reference it

Co-authored-by: Kevin Boyer <kevinboyer@navapbc.com>
Copy link

Promptfoo Evaluation Results

Success Failure Total Pass Rate
53 9 62 85.48%

View detailed results in Google Sheets

» View eval results «

Copy link

Promptfoo Evaluation Results

Success Failure Total Pass Rate
0 0 0 NaN%

View detailed results in Google Sheets

» View eval results in CI console «

Copy link

Promptfoo Evaluation Results

Success Failure Total Pass Rate
53 9 62 85.48%

View detailed results in Google Sheets

» View eval results «

Comment on lines 81 to 83
sed -i "s|GOOGLE_SHEET_INPUT_URL|${{ env.GOOGLE_SHEET_INPUT_URL }}|g" /tmp/promptfooconfig.processed.yaml
sed -i "s|GOOGLE_SHEET_OUTPUT_URL|${{ env.GOOGLE_SHEET_OUTPUT_URL }}|g" /tmp/promptfooconfig.processed.yaml
sed -i "s|CHATBOT_INSTANCE_URL|${{ env.CHATBOT_INSTANCE_URL }}|g" /tmp/promptfooconfig.processed.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try the envsubst command instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestion. Thank you.

Copy link

Promptfoo Evaluation Results

Success Failure Total Pass Rate
53 9 62 85.48%

View detailed results in Google Sheets

» View eval results «

@fg-nava fg-nava merged commit 99212e5 into main May 12, 2025
13 checks passed
@fg-nava fg-nava deleted the feat/promptfoo-github-action branch May 12, 2025 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants