llmbench

A tiny load-tester for OpenAI and Ollama like chat-completions endpoints, with optional streaming support.

Though, better fit for benchmarking than production use, it provides a simple way to measure response times, token usage, and throughput for LLM APIs. For example, testing a third-part provider other than a trusted company such as OpenAI or Anthropic, or possibly testing your own Ollama server's performance/throughput.

Features

Send concurrent requests to any /v1/chat/completions (OpenAI) or /chat (Ollama) endpoint
Measure response latency, token usage, and tokens-per-second
Approximate token counts for Ollama responses
Optional streaming mode (SSE) for real-time output
Optionally store each response and per-run metrics on disk via --store-data
Automatically unload Ollama models after the benchmark with --unload-model

Installation

# Requires Go 1.18+
go install go.codycody31.dev/llmbench@latest

Usage

llmbench [flags]

Flags

Flag	Default	Description
`--base-url`	`https://api.openai.com/v1`	API base URL
`--key`	(env `LLM_API_KEY`)	Bearer token (not used by Ollama)
`--style`	`openai`	API style: `openai` or `ollama`
`--stream`	`false`	Enable streaming (SSE) mode
`--runs`	`100`	Total requests to send
`--concurrency`	`0`	Simultaneous requests (0 = same as `--runs`)
`--max-tokens`	`4096`	`max_tokens` per request (OpenAI only)
`--model`	`gpt-4o-mini`	Model ID
`--prompt`	`Explain the fundamental concepts...`	The user message to send
`--timeout`	`60s`	HTTP client timeout (disabled in streaming mode)
`--unload-model`	`false`	Unload model after all runs complete (Ollama only)
`--data-dir`	`./runs`	Directory to store responses and metrics
`--store-data`	`false`	Store responses and per-run metrics to `--data-dir`

Examples

# OpenAI style (non-streaming)
export LLM_API_KEY="sk-..."
llmbench --runs 5 --model gpt-4o-mini --prompt "Hello, world!"

# Ollama style (non-streaming)
llmbench --style ollama \
         --base-url http://localhost:11434 \
         --runs 20 --concurrency 5 --model llama2

# OpenAI style (streaming)
export LLM_API_KEY="sk-..."
llmbench --style openai --stream \
         --runs 1 --model gpt-4 --prompt "Tell me a joke"

# Ollama style (streaming)
llmbench --style ollama --stream \
         --base-url http://localhost:11434 \
         --runs 1 --model llama2 --prompt "How are you today?"

gpt-4o-mini

2025/07/03 11:27:20.850482 Run 003 | request | model=gpt-4o-mini | prompt_tokens=2 | stream=false
2025/07/03 11:27:20.850772 Run 002 | request | model=gpt-4o-mini | prompt_tokens=2 | stream=false
2025/07/03 11:27:20.850782 Run 004 | request | model=gpt-4o-mini | prompt_tokens=2 | stream=false
2025/07/03 11:27:20.850813 Run 001 | request | model=gpt-4o-mini | prompt_tokens=2 | stream=false
2025/07/03 11:27:20.850862 Run 005 | request | model=gpt-4o-mini | prompt_tokens=2 | stream=false
2025/07/03 11:27:21.701164 Run 005 | success | completion_tokens=9 | latency_ms=850.135831 | model=gpt-4o-mini | prompt_tokens=2 | run=5 | stream=false | tok_per_sec=22.34936972089581 | total_tokens=19
2025/07/03 11:27:21.795267 Run 002 | success | completion_tokens=9 | latency_ms=944.420699 | model=gpt-4o-mini | prompt_tokens=2 | run=2 | stream=false | tok_per_sec=20.118152874156774 | total_tokens=19
2025/07/03 11:27:21.838177 Run 004 | success | completion_tokens=9 | latency_ms=987.299588 | model=gpt-4o-mini | prompt_tokens=2 | run=4 | stream=false | tok_per_sec=19.244411960597315 | total_tokens=19
2025/07/03 11:27:21.927233 Run 001 | success | completion_tokens=9 | latency_ms=1076.323388 | model=gpt-4o-mini | prompt_tokens=2 | run=1 | stream=false | tok_per_sec=17.65268711228637 | total_tokens=19
2025/07/03 11:27:22.657458 Run 003 | success | completion_tokens=9 | latency_ms=1806.808911 | model=gpt-4o-mini | prompt_tokens=2 | run=3 | stream=false | tok_per_sec=10.515777227091615 | total_tokens=19

=== Summary ===
Successful calls  : 5 / 5
Avg completion tokens    : 9.00
Avg total tokens         : 19.00
Avg tokens / sec         : 17.98
Total completion tokens  : 45
Total tokens             : 95

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llmbench

Features

Installation

Usage

Flags

Examples

gpt-4o-mini

License

About

Uh oh!

Languages

Codycody31/llmbench

Folders and files

Latest commit

History

Repository files navigation

llmbench

Features

Installation

Usage

Flags

Examples

gpt-4o-mini

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages