(Draft) Large Language Model Evaluation and Monitoring Framework

What is it?

This Large Language Model Evaluation and Monitoring Framework presents a structured approach to evaluating and monitoring in the responsible use of Large Language Models (LLMs).

It is organised around three key groups:

Suitability in Context
Wider Impact
Quantifiable Changes

each containing critical dimensions for consideration.

The emphasis is on the practical implications e.g. how often to review each dimension, what decisions they inform, and what actions they drive.

Dimension Management

The dimensions are currently stored in data/terms.json following the schema defined in data/terms.schema.json.

To add new dimensions, simply add a new item to the terms array following the structure below:

Field	Type	Description
metaGroup	string	The title of the group e.g. "Wider Impact".
metaDimension	string	The title of the dimension e.g. "Benchmark Relevance".
considerationsSetup	string	Considerations at setup for the given dimension.
monitoringAction	string	Actions around monitoring for the given dimension.
monitoringRationale	string	Rationale around monitoring for the given dimension.
monitoringFrequency	string	Frequency of monitoring for the given dimension.
monitoringUpdate	string	Whilst monitoring when to update a model for the given dimension.
monitoringRetire	string	Whilst monitoring when to retire a model for the given dimension.
updateretireUpdate	string	How to update a model for the given dimension.
updateretireRetire	string	How to retire a model for the given dimension.
termCode	string	The short-name of the dimension for use in the URL and related dimensions e.g. "qc-benchmark-relevance".
related	array	(optional) Array of termCodes of related dimensions, e.g. "sic-cost" or "wi-documentation".

Take a look at the existing dimensions in the JSON file as an example of the structure above. You can also edit existing dimensions or remove them from the array.

Updates to the Framework

Framework updates should be made to the .xlxs version in framework folder, and versions updated accordingly using semantic versioning.

The README.md in the framework folder gives details on how to then automatically update the terms.json.

Changes should be reflected in the CHANGELOG.md.

Tests

Automated tests run to check valid JSON syntax on every push to this repo.
To run tests manually, using python3 from the main directory, execute:
- pip install -r tests/requirements.txt
- jsonschema --instance data/terms.json data/terms.schema.json
- cd tests && python validate_json.py

Stack

The tool is primarily built using Next.js (v12) and Tailwind CSS (v3) on top of React (v17). There are also some other supporting tools such as Framer Motion, Headless UI and Heroicons.

There is currently no 'backend' for this project. A JSON file (data/terms.json) is used to house and manage the available terms.

Getting started

You will need:

node v14

It is recommended to use a node package manager like nvm to manage node versions.

Once the project has been cloned, ensure you have installed the required dependencies using the command below:

npm i

Now you have installed the required packages, you can start the development server using the following command:

npm run dev

Open http://localhost:3000 with your browser to see the result.

Hot reloading is enabled on this project, so your changes should reflect in realtime when running the dev server without constantly refreshing the page.

Building

If you want to build and export the project locally, then run the commands below:

npm run build

If the build completes successfully, then you can export the project to static HTML files using the command:

npm run export

Note: using the static export feature disables some of the features of Next.js as you can read here. Currently, none of these features are used on this project.

Deployment

Upon merging into main, the project is automatically deployed to Github Pages via a Githook Action found in .github/workflows/gh-pages-deploy.yml.

Licence

Unless stated otherwise, the codebase is released under the MIT Licence. This covers both the codebase and any sample code in the documentation.

The documentation including term definitions is © Crown copyright and available under the terms of the Open Government 3.0 licence.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
atoms		atoms
components		components
data		data
framework		framework
lib		lib
pages		pages
public		public
styles		styles
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENCE.md		LICENCE.md
README.md		README.md
jsconfig.json		jsconfig.json
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
versioning_guide.svg		versioning_guide.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

(Draft) Large Language Model Evaluation and Monitoring Framework

What is it?

Dimension Management

Updates to the Framework

Tests

Stack

Getting started

Building

Deployment

Licence

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

nhsengland/llm-eval-monitor-framework

Folders and files

Latest commit

History

Repository files navigation

(Draft) Large Language Model Evaluation and Monitoring Framework

What is it?

Dimension Management

Updates to the Framework

Tests

Stack

Getting started

Building

Deployment

Licence

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages