Warning
Currently under development and features are experimental. NOT recommended for production applications.
- NPM
- Node.js (>=23.5.0)
This tool currently only supports 'Golden Question' style of testing, which is sending a query and seeing if the response matches an expected value.
So you can see if this tool is suited for you, here's a brief explanation of the process.
For each test case:
- Create a validation system prompt using the expected answer in the test case.
- Query the LLM using the question and system prompt in
.env
. This should be as close as possible to your app you're looking to test. - Use the answer from 2) with the validation system prompt to query the LLM. We're using the LLM to access its own answer by changing to the validation system prompt.
The validation system prompt in golden-questions/prompts/system/validate.md
is experimental and only tested so far on GPT-4o. You can edit this as you see fit.
Create a .env
file in the root of the project. Use .env.sample
to help populate the values.
Run in root:
npm install
There's some sample test cases in tests/golden-questions/data/test-cases.csv
. You can edit these with your own tests.
The CSV columns are as follows:
question | expected (golden answer) | expected citations (TODO) |
---|---|---|
What is the capital of the UK | London | [] |
There's an .run
config for IntelliJ IDE's that have hopefully loaded automatically.
Otherwise, you can run:
npx playwright test
Playwright should open the finished test report for you, there's some useful info each test attaches in the Attachments
section.