stage0_runbook_evaluate

Evaluation Runbook

This runbook reads a configuration.yaml file, loads the referenced data files from the input folder, and then evaluates a model's responses to each assistant reply in the specified test conversations. Grades are assigned by a separate grading prompt that specializes in comparing given and expected values. Grades are written to the output folder as yyyy.mm.ddThh:mm:ss-evaluation.json

Expected Directory Structure

When you run the utility you will specify folders to mount for /config, /input and /output

/📁 config
├── 📝 configuration.yaml               # Evaluation Pipeline Configuration
/📁 input
│── 📁 conversations                    # Conversations that are used to test the model
│   ├── 💬 test_conversation1.csv       
│   ├── 💬 test_conversation2.csv       
│── 📁 grader                           # Simple LLM message list with grading prompts
│   ├── ✏️ grader1.csv       
│   ├── ✏️ grader2.csv       
│── 📁 prompts                          # Formatted Echo messages with your bot prompts
│   ├── 🧑‍🏫 your_base.csv                    # A base prompt that establishes a name (Fran, Json, Ivan, etc.)
│   ├── 🧑‍🏫 echo.csv                         # Group chat and echo formatted message comprehension
│   ├── 🧑‍🏫 tool.csv                         # How to use agent/action commands
│   ├── 🧑‍🏫 you.csv                          # Your custom prompt, specific agent/actions or processes
/📁 output
│   ├── 📀 yyyy-mm-ddThh:mm:ss-output.json  # Grades from running the evaluation

Using this in your project.

Adjust the command below to use appropriate values for your Echo Bot project, and add it to your pipenv scripts. See the [test_data] folder for sample files. Grades will be written to a file called {datetime}-output.json in the output folder when you run the tool.

docker run --rm /
    -v ./bot_name:/input
    -v ./bot_name/config:/config
    -v ./bot_name/output:/output
    ghcr.io/agile-learning-institute/stage0-echo-evaluate:latest

Contributing

Prerequisites

Ensure the following tools are installed:

Testing

All testing uses config/input/output folders in ./test_data.

Install Dependencies

pipenv install

Run Evaluate Runbook locally.

pipenv run evaluate

Debug Evaluate Runbook locally

pipenv run debug

Runs locally with logging level set to DEBUG

Build the Evaluate Runbook container

pipenv run build

Build, and run the Evaluate Runbook container

pipenv run container

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
test_data		test_data
.gitignore		.gitignore
Dockerfile		Dockerfile
EVALUATE.md		EVALUATE.md
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
evaluate_runbook.py		evaluate_runbook.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

stage0_runbook_evaluate

Evaluation Runbook

Expected Directory Structure

Using this in your project.

Contributing

Prerequisites

Testing

Install Dependencies

Run Evaluate Runbook locally.

Debug Evaluate Runbook locally

Build the Evaluate Runbook container

Build, and run the Evaluate Runbook container

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Languages

License

agile-learning-institute/stage0_runbook_evaluate

Folders and files

Latest commit

History

Repository files navigation

stage0_runbook_evaluate

Evaluation Runbook

Expected Directory Structure

Using this in your project.

Contributing

Prerequisites

Testing

Install Dependencies

Run Evaluate Runbook locally.

Debug Evaluate Runbook locally

Build the Evaluate Runbook container

Build, and run the Evaluate Runbook container

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Languages

Packages