LMBench: Modular LLM Serving Benchmarking Framework

Overview

LMBench is a benchmarking framework for LMCache and vLLM Production Stack with external baseline comparison support (SGLang, Dynamo, RayServe, LLM-D etc.).

Core Architecture: Cartesian product evaluation of serving baselines × workload generators across configurable infrastructure.

Key Definitions

Suite: Cartesian product of serving baselines and workloads defined in 0-bench-specs/*.yaml
Session: Single deployment with unique lmbench-session-id for result tracking
Baseline: Serving system exposing OpenAI-compatible API on localhost:30080
Workload: Traffic generator simulating specific usage patterns

Modular Structure (Minimal Overview)

LMBench/
├── 0-bench-specs/                # Suite definitions (baselines × workloads)
│   ├── daily/                    # Daily Benchmarks that show up on lmbench.dev
│   ├── open-source-comparisons/  # Comparison Specs for OSS engines/orchestration layers
│   └── TEMPLATE-spec.yaml        # All available options
├── 1-infrastructure/        # Platform setup (K8s clusters, cloud resources)
│   ├── lmcache-gke/        # Google Kubernetes Engine
│   ├── local-minikube/     # Local Kubernetes
│   └── local-flat/         # Direct script execution
├── 2-serving-engines/       # Baseline deployments
│   ├── helm-production-stack/    # vLLM Production Stack (Helm)
│   ├── direct-production-stack/  # vLLM Production Stack (Direct K8s Manifests)
│   ├── sglang/             # SGLang baseline
│   ├── dynamo/             # Dynamo baseline
│   ├── rayserve/           # RayServe baseline  
│   ├── llm-d/              # LLM-D baseline  
│   └── ADDING_NEW_BASELINES.md
├── 3-workloads/            # Traffic generators
│   ├── synthetic/          # Configurable multi-round QA
│   ├── agentic/           # Multi-agent conversations
│   ├── sharegpt/          # Real conversation data
│   ├── vllm-benchmark-serving/  # vLLM benchmark integration
│   └── ADDING_NEW_WORKLOADS.md
├── 4-latest-results/       # Benchmark outputs and post-processing
├── run-bench.py           # Main orchestrator
├── run-bench.yaml         # Top-level configuration
└── TEMPLATE-run-bench.yaml # Infrastructure options

Quick Start

Configure: Edit run-bench.yaml to select suites and infrastructure
Deploy: export HF_TOKEN=<token> && CUDA_VISIBLE_DEVICES=0,1 python run-bench.py
Results: View .png graphs in 4-latest-results/<suite-name>/

Infrastructure Options

LMCacheGKE: Managed GKE cluster with GPUs
LocalMinikube: Local Kubernetes development
Local-Flat: Direct script deployment (no containers)

Example Configuration

# run-bench.yaml
0-bench-specs:
  - layerwise/layerwise-spec.yaml

1-infrastructure:
  Location: LocalMinikube

Extensibility

New Baselines: See 2-serving-engines/ADDING_NEW_BASELINES.md
New Workloads: See 3-workloads/ADDING_NEW_WORKLOADS.md
New Suites: Create spec files using 0-bench-specs/TEMPLATE-spec.yaml

Artifacts

Each session produces per-suite directories with:

suite-name/
├── {baseline}_{workload}_{qps}_{timestamp}.json  # Raw results
├── {workload}_comparison.png                     # Performance graphs
└── pod-logs/                                     # Infrastructure logs

Dashboard and Uploading Results

Upload results to lmbench.dev for time-series analysis and cross-session comparison. Results are grouped by suite → workload with QPS and temporal views.

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
.github		.github
0-bench-specs		0-bench-specs
1-infrastructure		1-infrastructure
2-serving-engines		2-serving-engines
3-workloads		3-workloads
4-latest-results/post-processing		4-latest-results/post-processing
.gitignore		.gitignore
README.md		README.md
TEMPLATE-run-bench.yaml		TEMPLATE-run-bench.yaml
img.png		img.png
requirements.txt		requirements.txt
run-bench.py		run-bench.py
run-bench.yaml		run-bench.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LMBench: Modular LLM Serving Benchmarking Framework

Overview

Key Definitions

Modular Structure (Minimal Overview)

Quick Start

Infrastructure Options

Example Configuration

Extensibility

Artifacts

Dashboard and Uploading Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

LMCache/LMBench

Folders and files

Latest commit

History

Repository files navigation

LMBench: Modular LLM Serving Benchmarking Framework

Overview

Key Definitions

Modular Structure (Minimal Overview)

Quick Start

Infrastructure Options

Example Configuration

Extensibility

Artifacts

Dashboard and Uploading Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages