Skip to content

toby-j/llm-tester

Repository files navigation


Logo

llm-tester

A question and response testing framework for Azure LLM Deployments

Warning

Currently under development and features are experimental. NOT recommended for production applications.

Prerequisites

  • NPM
  • Node.js (>=23.5.0)

Process

This tool currently only supports 'Golden Question' style of testing, which is sending a query and seeing if the response matches an expected value.

So you can see if this tool is suited for you, here's a brief explanation of the process.

For each test case:

  1. Create a validation system prompt using the expected answer in the test case.
  2. Query the LLM using the question and system prompt in .env. This should be as close as possible to your app you're looking to test.
  3. Use the answer from 2) with the validation system prompt to query the LLM. We're using the LLM to access its own answer by changing to the validation system prompt.

The validation system prompt in golden-questions/prompts/system/validate.md is experimental and only tested so far on GPT-4o. You can edit this as you see fit.

Setup

Step 0: Environment setup

Create a .env file in the root of the project. Use .env.sample to help populate the values.

Step 1: Install dependencies

Run in root:

npm install

Step 2: Test Cases

There's some sample test cases in tests/golden-questions/data/test-cases.csv. You can edit these with your own tests.

The CSV columns are as follows:

question expected (golden answer) expected citations (TODO)
What is the capital of the UK London []

Step 3: Run

There's an .run config for IntelliJ IDE's that have hopefully loaded automatically.

Otherwise, you can run:

npx playwright test

Step 4: Assess

Playwright should open the finished test report for you, there's some useful info each test attaches in the Attachments section.

About

Playwright tester for Azure LLM deployments

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published