GitHub - toby-j/llm-tester: Playwright tester for Azure LLM deployments

llm-tester

A question and response testing framework for Azure LLM Deployments

Warning

Currently under development and features are experimental. NOT recommended for production applications.

Prerequisites

NPM
Node.js (>=23.5.0)

Process

This tool currently only supports 'Golden Question' style of testing, which is sending a query and seeing if the response matches an expected value.

So you can see if this tool is suited for you, here's a brief explanation of the process.

For each test case:

Create a validation system prompt using the expected answer in the test case.
Query the LLM using the question and system prompt in .env. This should be as close as possible to your app you're looking to test.
Use the answer from 2) with the validation system prompt to query the LLM. We're using the LLM to access its own answer by changing to the validation system prompt.

The validation system prompt in golden-questions/prompts/system/validate.md is experimental and only tested so far on GPT-4o. You can edit this as you see fit.

Setup

Step 0: Environment setup

Create a .env file in the root of the project. Use .env.sample to help populate the values.

Step 1: Install dependencies

Run in root:

npm install

Step 2: Test Cases

There's some sample test cases in tests/golden-questions/data/test-cases.csv. You can edit these with your own tests.

The CSV columns are as follows:

question	expected (golden answer)	expected citations (TODO)
What is the capital of the UK	London	[]

Step 3: Run

There's an .run config for IntelliJ IDE's that have hopefully loaded automatically.

Otherwise, you can run:

npx playwright test

Step 4: Assess

Playwright should open the finished test report for you, there's some useful info each test attaches in the Attachments section.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.run		.run
docs/assets		docs/assets
tests		tests
.env.sample		.env.sample
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
playwright.service.config.ts		playwright.service.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llm-tester

Prerequisites

Process

Setup

Step 0: Environment setup

Step 1: Install dependencies

Step 2: Test Cases

Step 3: Run

Step 4: Assess

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

toby-j/llm-tester

Folders and files

Latest commit

History

Repository files navigation

llm-tester

Prerequisites

Process

Setup

Step 0: Environment setup

Step 1: Install dependencies

Step 2: Test Cases

Step 3: Run

Step 4: Assess

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages