This repository contains a comprehensive suite of benchmarks for evaluating LLM serving systems. The suite includes multiple scenarios to test different aspects of model performance.
The workload simulated in these benchmarks is a multi-round QA (question answering) task with multiple users interacting with an LLM engine concurrently. An illustration is shown below:
-
ShareGPT Benchmark
- Replays real-world conversations from ShareGPT
- Default QPS: 1.34
-
Short Input, Short Output (Synthetic)
- System prompt: 0 tokens
- Chat history: 256 tokens
- Answer length: 20 tokens
- Default QPS: 15
-
Long Input, Short Output (Synthetic)
- System prompt: 1000 tokens
- Chat history: 20000 tokens
- Answer length: 100 tokens
- Default QPS: 0.1
The unified script run_benchmarks.sh
can run any combination of benchmarks with consistent configuration:
# Run all benchmarks with default QPS
./run_benchmarks.sh <model> <base_url> <save_file_key> all
# Run specific benchmarks with default QPS
./run_benchmarks.sh <model> <base_url> <save_file_key> sharegpt short-input
# Run specific benchmarks with custom QPS
./run_benchmarks.sh <model> <base_url> <save_file_key> sharegpt short-input 1.34 2.0 3.0
# Run all benchmarks with default QPS
./run_benchmarks.sh meta-llama/Llama-3.1-8B-Instruct http://localhost:8000 /mnt/requests/benchmark all
# Run ShareGPT and short input benchmarks with custom QPS
./run_benchmarks.sh meta-llama/Llama-3.1-8B-Instruct http://localhost:8000 /mnt/requests/benchmark sharegpt short-input 1.34 2.0 3.0
Results are saved in CSV format with the following naming convention:
- ShareGPT:
<save_file_key>_sharegpt_output_<qps>.csv
- Short Input:
<save_file_key>_short_input_output_<qps>.csv
- Long Input:
<save_file_key>_long_input_output_<qps>.csv
python3 synthetic-multi-round-qa/multi-round-qa.py --process-summary <your_csv_file>
python3 synthetic-multi-round-qa/calculat_itl.py
- The warm-up phase is automatically handled for all benchmarks
- All scripts handle their paths correctly regardless of where they're run from
- QPS values can be customized through command-line arguments
- Results are saved in CSV format with the QPS value in the filename
This directory contains the necessary files to run the benchmark in Docker and Kubernetes environments.
Dockerfile
: Defines the Docker image for running the benchmarkbenchmark-job.yaml
: Kubernetes job configurationrun_benchmarks.sh
: Main benchmark script
The following environment variables can be configured:
MODEL
: The model name to benchmark (default: "meta-llama/Llama-3.1-8B-Instruct")BASE_URL
: The base URL of the vLLM server (default: "http://localhost:8000")SAVE_FILE_KEY
: Prefix for the output files (default: "benchmark_results")SCENARIOS
: Benchmark scenarios to run (default: "all")- Options: "all", "sharegpt", "short-input", "long-input"
QPS_VALUES
: Space-separated list of QPS values to test (default: "1.34")
docker pull lmcache/lmcache-benchmark
docker build -t your-registry/benchmark:latest .
docker run -e MODEL="meta-llama/Llama-3.1-8B-Instruct" \
-e BASE_URL="http://vllm-service:8000" \
-e SAVE_FILE_KEY="benchmark_results" \
-e SCENARIOS="all" \
-e QPS_VALUES="1.34 2.0 3.0" \
-v /path/to/results:/app/results \
your-registry/benchmark:latest
# To use the pre-built image, replace <your-registry/benchmark:latest> with <lmcache/lmcache-benchmark>
- Create a PersistentVolumeClaim for storing results:
kubectl apply -f benchmark-results-pvc.yaml
- Deploy the benchmark job:
kubectl apply -f benchmark-job.yaml
- Monitor the job:
kubectl get jobs
kubectl logs job/benchmark-job
The benchmark results will be saved in the mounted volume with the following structure:
{SAVE_FILE_KEY}_sharegpt_qps{X}.csv
for ShareGPT benchmarks{SAVE_FILE_KEY}_short_input_qps{X}.csv
for short input benchmarks{SAVE_FILE_KEY}_long_input_qps{X}.csv
for long input benchmarks
Where X
is the QPS value used for that run.
To reproduce results from our latest benchmarking runs against other open-source LLM serving systems, please refer to the configuration scripts inside the folders in the configs directory.
Latest results: