This Large Language Model Evaluation and Monitoring Framework presents a structured approach to evaluating and monitoring in the responsible use of Large Language Models (LLMs).
It is organised around three key groups:
- Suitability in Context
- Wider Impact
- Quantifiable Changes
each containing critical dimensions for consideration.
The emphasis is on the practical implications e.g. how often to review each dimension, what decisions they inform, and what actions they drive.
The dimensions are currently stored in data/terms.json
following the schema defined in data/terms.schema.json
.
To add new dimensions, simply add a new item to the terms array following the structure below:
Field | Type | Description |
---|---|---|
metaGroup | string | The title of the group e.g. "Wider Impact". |
metaDimension | string | The title of the dimension e.g. "Benchmark Relevance". |
considerationsSetup | string | Considerations at setup for the given dimension. |
monitoringAction | string | Actions around monitoring for the given dimension. |
monitoringRationale | string | Rationale around monitoring for the given dimension. |
monitoringFrequency | string | Frequency of monitoring for the given dimension. |
monitoringUpdate | string | Whilst monitoring when to update a model for the given dimension. |
monitoringRetire | string | Whilst monitoring when to retire a model for the given dimension. |
updateretireUpdate | string | How to update a model for the given dimension. |
updateretireRetire | string | How to retire a model for the given dimension. |
termCode | string | The short-name of the dimension for use in the URL and related dimensions e.g. "qc-benchmark-relevance". |
related | array | (optional) Array of termCodes of related dimensions, e.g. "sic-cost" or "wi-documentation". |
Take a look at the existing dimensions in the JSON file as an example of the structure above. You can also edit existing dimensions or remove them from the array.
Framework updates should be made to the .xlxs
version in framework folder, and versions updated accordingly using semantic versioning.
The README.md
in the framework
folder gives details on how to then automatically update the terms.json
.
Changes should be reflected in the CHANGELOG.md
.
- Automated tests run to check valid JSON syntax on every push to this repo.
- To run tests manually, using python3 from the main directory, execute:
pip install -r tests/requirements.txt
jsonschema --instance data/terms.json data/terms.schema.json
cd tests && python validate_json.py
The tool is primarily built using Next.js (v12)
and Tailwind CSS (v3)
on top of React (v17)
.
There are also some other supporting tools such as Framer Motion, Headless UI and Heroicons.
There is currently no 'backend' for this project. A JSON file (data/terms.json
) is used to house and manage the available terms.
You will need:
node v14
It is recommended to use a node package manager like nvm to manage node versions.
Once the project has been cloned, ensure you have installed the required dependencies using the command below:
npm i
Now you have installed the required packages, you can start the development server using the following command:
npm run dev
Open http://localhost:3000 with your browser to see the result.
Hot reloading is enabled on this project, so your changes should reflect in realtime when running the dev server without constantly refreshing the page.
If you want to build and export the project locally, then run the commands below:
npm run build
If the build completes successfully, then you can export the project to static HTML files using the command:
npm run export
Note: using the static export feature disables some of the features of Next.js
as you can read here. Currently, none of these features are used on this project.
Upon merging into main
, the project is automatically deployed to Github Pages via a Githook Action found in .github/workflows/gh-pages-deploy.yml
.
Unless stated otherwise, the codebase is released under the MIT Licence. This covers both the codebase and any sample code in the documentation.
The documentation including term definitions is © Crown copyright and available under the terms of the Open Government 3.0 licence.