This repository implements Ragas as an out-of-tree Llama Stack evaluation provider. Ragas is a toolkit for evaluating and optimizing Large Language Model (LLM) applications with objective metrics.
The goal is to provide all of Ragas' evaluation functionality over Llama Stack's eval API, while leveraging the Llama Stack's built-in APIs for inference (llms and embeddings), datasets & benchmarks.
- Python 3.12
uv
Clone and use uv
(see below). I will update this README with alternative instructions using the build.yaml file and llama stack's build command.
-
Clone this repository
git clone <repository-url> cd llama-stack-provider-ragas
-
Create and activate a virtual environment
uv venv source .venv/bin/activate uv sync
-
Install as an editable package. There's
distro
anddev
optional dependencies to run the sample LS distribution:uv pip install -e ".[distro]" uv pip install -e ".[dev]"
See the demo notebook for a complete example of using the Ragas provider with Llama Stack.
See config params in the run.yaml
file. These are still changing.